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(54) Process for the development of binding mlni-protelns 



(57) r The invention-concerns a^process for identify- 
ing proteins with a desired binding activity against a tar* 
get. said process comprising 

(a) scneenlngt for binding activity against said tar- 
vget. a population of genetic packages; each pack- 
^^t-sge displaying a potential binding domain, said pop* 
uiation colle^eiy displaying a plurality of different 
potential binding domains, said domains differing at 
one or more variable amino add positions, 
each said potential binding domain being a mtero- 
proteln sequence of lees than forty amino adds and 



having a single disulfide bond between a first amino 
add posltton and a second amino add position 
thereof, the amino adds at said first and second po* 
sittons being invariant cysteines In the potential 
binding domains displayed by said population, and 
(b) identifying a protein having the desired binding 
activity against said target. 
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Description 

BACKGROUND OF THE INVENTION 

Field of the Invention 
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pooil Tt,. invon^on relates to --•«'P«f novo^J^^^^^^^ 

teJe process of mutagenesis. expressU^n. J^^^^^f,; Zl m^genesis of a limited numt,er of pr^ 

n„nl-proteln potential binding doman. f ^,9^^ ^ "^^^^ I^Tng diiSric expression product to l>e dis- 
determlned codons. IB fused to a genetic elen^nt v^^^^ 
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target molecules. i„„k.tino the seauence of a protein In order, e^. to alter Its binding 

[00031 "Protein engineerintf- is ttie art of manipulating the sequence oi a ^ ,^gn,^8urface8 has proved 
™L-»«« Th«^orsaf»ecUnaprotelnblndlngarelaiovm.butdeslgningnewcompB^^^ „r„t«ina can be 

d;";;i^;i^ho^aL(QUI087)suggestJisun.M^^^^ 

constructed with binding properties superior to ^'fJ^J ^'^^^^^^^^ witdnson et al (W1LK84) reported 

!^^jrr^~-e%S^^ 

mutatingthe gene encodingthenaBve protein and thenexp^^ing^^^^^ 

SSMr™.e,s,andommutagenes«bymear»ofrel^^^^^ 
STartorch'emlc^ agents. SeeHo^aL(HOCJ^^^ 

[00071 It is possible to randomly ^^'V P'^^'^^^LIPSr^^ o^^^ in the mixture, for each position of 

of a nucleic acid synthesis procedure. (OLiP86 ^^^Jno^W occur In the polypeptides expressed from the 

;^^xrro;.K^^^^^ 
?s^?:rrd^j^rravepub..h^a^^^^^^ 

C«osi.transportprotelnl^BolE«!l(FER^^ 

rerr2p"jL:rx,rp^^^^^ 

ta^^i^'var.^onwasseeninthedeg.e.^^^^^ 

Lre was no selectionforafiinlty to a target molecule not ^ adapted to devel- 

soughtorfound.FERE64specu.^odm«^^^^^^^^^^ 
opment of similar mutams ot oin«. "^7" LI .ue cell surface. Ferend-s mutant surface prolans «uu,u 

r^ult in the relocation of an intracellular bactenalpro^^ S,Md^ oogenous or heterologous binding domain. 
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does not surest that surface residues should be preferentialty varied. In consequence, Ferencrs sele^on system 
much less efficient than that disclosed herein. 

[0012] A number of researchers have directed unmutated foreign antigenic epitopes to the surface of bacteria or 
phage, fi^ed to a native bacterial or phage surface protein, and demonstrated that the epitopes were recognized by 
5 antibodies. Thus. Charbtt, et al. (CHARB6a,b) genetically inserted the 03 epitope of the VP1 coat protein of poliovlrus 
into the LamB outer membrane protein of E^odII, and detennined immunologically that the C3 epitope was exposed 
on the bacterial cell surface. Gharbtt, et al. (CI-IAR87) likewise produced chimeras of LamB and the A (or B) epitopes 
of the preS2 region of hepatitis B vims. 

t0013] A chimeric LaoZ/OmpB protein has been expressed In E. coll and is, depending on the fusion, directed to 
10 either the outer membrane orthe periplasm (SILH77). A chimeric LacZ/OmpA surface protein has also been expressed 
and displayed on the surface of E. coli cells (WEIN8d). Others have expressed and displayed on the surface of a ceil 
chimeras of other bacterial surface proteins, such as E. coll type 1 fimbriae (H EDE&9) and Bacterioides.nodusus type 
1 fimbriae (JENN89). In none of the recited cases was the inserted genetic material mutagenlzed. 
[0014] Duibecco (DUI386) suggests a procedure for Incorporating a foreign antigenic epitope Into a viral surface 
15 protein so that the expressed chimeric protein is displayed on the surface of the vims in a manner such that the foreign 
epitope is accessible to antibody. In 1985 Smith (SMtTSS) reported Inserting a nonfunctional segment of the EcoRI 
endonuclease gene Into gene IN of bacteriophage f1 , "in phase". The gene III protein is a minor coat protein necessary 
for infectlvlty. Smith demonstrated that the recombinant phage were adsori^ed by lmn[K)biiized antibody raised against 
the Eco RI endonuclease, and couid be eluted with acid. De la Cruz et aL (DELA88> have expressed a fragment of tiie 
20 repeat region of the drcumsporozoite protein from Plasmodium falciparum on the surface of 13 as an insert in the 
gene III protein. They showed that the recombinant phage were both antigenic and immunogenic in rabbits, and that 
such recombinant phage could be used for B epitope mapping. The researchers suggest that simitar recombinant 
phage could be used for T epitope mapping and for vaccine development. 

[0015] None of these researchers suggested mutagenesis of the inserted material, nor is the inserted material a 
25 complete binding domain conferring on the chimeric protein the ability to bind specfTically to a receptor other than the 
antigen combining site of an antibody. 

[(KI15] McCafferty et aK (MCCA90) expressed a fusion of an Fv fragment of an antibody to the N-terminal of the pill 
protein. The Fv fragnnent was not mutated. 

[0017] Pamnley and Smith (PARM88) suggested that an epitope library that exhibits all possible hexapeptides could 
30 be constructed and used to isolate epitopes that bind to antibodies. In discussing the epitope ibrary, the authors did 
not sugg^ that it was desirable to balance the representation of different amino actds. Nor did they teach that the 
Insert should encode a complete domain of the exogenous protein . Epitopes are considered to be unstructured peptides 
as opposed to structured ^proteins. 

[0018] Scott and Smith {SCOT90) and Cwlria m ah (CWIR90) prepared "epitope libraries" In which potential hexa- 
3s ^ peptide epitopes for a.target antibody were randomly mutated by fusing degenerate oligonucleotides, encoding the 
.-epitopes, with gene III of fd phage; and expressing the fused gene In phage-lnfected cells. The cells manufactured 
1%»fu8ion phage which displayed the ispitopes on their surface; the phage which bound to immobilized antibody were 
t^f^eluted with acid and studied. In both cas^, the fused gene featured a segment encoding a spacer region to separate 
^^the/yariable region from^the wild type pill sequence so that the varied amino ^Ids would not be constrained by the 
^ nearby pill sequence. Devlin et al. (DEVL90) similarty screened, using M13 phage, for random 15 residue epitopes 
recognized by streptavidin. Again, a spacer was used to nrrove the random peptides away from the rest of the chimeric 
phage protein. These references therefore taught away from constraining the conformational repertoire of the mutated 
residues. 

[001 9] Another problem with the Scott and Smith, Cwiria etal. . and Devlin et ah . Ifbrari^ was that they provided a 
45 highly biased sanpllng of the possible amino acids at each position. Their primary concern In designing the degenerate 
oligonucleotide encoding their variable region was to ensure that ail twenty amino acids were encodible at each position ; 
a secondary consideration was minimizing the frequency of occurrence of stop signals. Consequently, Scott and Smith 
and Cwiria et ah employed NNK (N«equal mixture of G, A, T, C; K=equal mixture of G and T) while Devlin et ah used 
NNS (S=equal mixture of G and C). There was no attempt to minimize the frequency ratio of most favored-to-ieast 
50 favored amino add, or to equalize the rate of occun^nce of acidic and basic amino adds. 

[0020] Devlin et al. characterized several affinity- selected streptavidin-bindlng peptides, but did not measure the 
affinity constants for these peptides. Cwiria etal. did determine the afTintty constant for his peptide, but were disap- 
pointed to find that his best hexapeptides had affinities (350-300nM), "orders of magnitude' weaker than that of the 
native Met-enkephalin epitope (7nM) recognized by the target antibody. Cwiria et aL speculated that phage k>earing 
55 peptides with higher affinities remained bound under addle etution, possibly because of muttlvaient interactions be- 
tween phage (carrying about 4 copies of plli) and the dhralent target IgG. Scott and Smith were able to find peptides 
whose affinity for the target antibody (A2) was comparable to ttial of the reference myohemerythrin epitope (50nM). 
However, Scott and Smith likewise expressed concem that some high-affinity peptides were lost, possibly through 
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irreversible binding of fusion phage to target nonbiological synthesis on solid supports. While they 

[00211 Lam. et al. (LAM91) createda pentapeptide "^^.^^^ "^^^^^ m roughV equlmolar proportions. 

body, subdoning the SCAD gene into SP^a^,"^^ chromatography. The onhr antigen 

surface of phage X. and selecting phage «^l^nollS!S taS. carfe/organfems. oroutersurface prote ns 
mentioned is t)ovinegrDWthhorTnone^Noott«rbMng^^^^^ 

fSrUera„dBird.WOe.0Sa01 

iors (DNA-binding proteins) may ^^^J^"^.^^^^^^^^ for use in the design of asymmetnc 

(PANT90>. Pantoiiano and Ladner(PANTB7>. Pabo f"^^^'^*'^^!^^ of imown proteins displayed 

il^MM Lfldner.etal..W090/02809de8cril«sBemlrandommutegen^fi^^^^^ 

«d"o,;«insofseS^fificiaioutersurfaceprotelnsorD^™^ 

desired binding characteristics. THe ^^f'^^^^^^^^^^z^m disulfides: 58 AAs). and BPTI (5:55. 

!Xdre;Cp=srm«^^^^^^^ 

30 SUMMARY OF THE INVENTION 

A ^ o ch«i« Chain of the same or different amino acids Joined by peptide 
[00271 A polypeptide is a polymer composed of a ^^'^.^J^'" °' ™ ^.^ugh intemal rotations aboutthe 

K-Unear^^descantateupaverylargenj^^^^ 

niainchainsinglebondsofeachacfl^^^^^ 

rdSt^rr^^^^^ 

hydrophobic interactions are sufficient to ^f^^'lf ^f^^S^'f j; " J^njed by a denaturant such as high temperatui^ 
sL^nts are held to more or less that conformaton unlaw ttepeiwroe y ^^^^ ^ ^ ^ ^ 

X or high pH. Whereupon the polypeptide unfo^^ or^^^^ 
confom,ation«rtilbedetem,ined by the environment if a^^ 

1^''ZZZ^^^^^'-^^^^^ ^^"^ '''' "^^"^ " 

. t I.. /Kt It nnt limited tD^l 
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nostic agents, including (but not limited to): 

c> lower antigenicity, and 
d) higher activity per mass. 
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of choice. 

[0032] Most pofypeptides of this.size, however, have dtsadvantages as binding molecules. According to Olivera et 
aL (OLIVdOa) : "Pe^des in this size range normally equilibrate among many conformations (in order to have a fixed 
confomiation, proteins generally have to be much larger)." Specific binding of a peptide to a target molecule requires 

5 the peptide to take up one confomiation that Is complementary to the binding site. For a decapeptide with three iso- 
energetic conformations (e.g. , p strand, a helix, and reverse turn) at each residue, there are about 6.-10^ possible 
overall confomtations. Assuming these confonnaUons to bo equl-probable for the unconstrained decapeptide, if only 
one of the possible confoonattons bound to the binding site, then the affinity of the peptide for the target would be 
expected to be about 6-10^ higher if it could be constrained to that single effective conf onmation . Thus, the unconstrained 

10 decapeptide, relative to a decapeptide constrained to the correct conformation, would be expected to exhibit lower 
affinity, it would aiso exhibit lower specif ictty, since one of the other confornuitlons of the unconstrained decapeptide 
might be one which bound tightly to a material other than the intended target. By way of corollary, it could have 
resistance to degradation by proteases, since it would be more likely to provide a binding site for the protease. 
[0033] The present Invention overcomes these problems, white retaining the advantages of smaller polypeptides, 

IS by identifying novel mint^yrptelna having the desired binding characteristics. MIni-Protelns are small polypeptides 
whtoh, while too small to have a stable confomnation as a result of noncovalentforces alone, are covalentty crosslinked 
(e.g., by disulfide bonds) into a stable confomiation and hence have biological acdvlties more typical of larger protein 
molecules than of unconstrained polypeptides of comparable size. THe mini^roteins with which the present Invention 
is particularly concerned fail Into two categories: (a) disutfide-bonded micro-proteins of less than 40 amino adds; and 

20 (b) metal Ion-coordinated mini-proteins of less than 60 amino acids. 

[0034] The present Inventton relates to the construction, expression, and selection of mutated genes that specify 
novel mini-proteins with desirable binding properties, as well as these mini-proteins themseh^es, and the "ilbrarles" of 
mutant "genetic packages" used to display the minhproteins to a potential "target" material. The "targets" may be, but 
need not be, proteins. Targets may Include other biological or ^thetb macromolecules as well as other organic and 

2S inor)ganic substances. 

[0035] The prior application, WO90/02B09 generally teaches that stable protein domains may be mutated in order 
to identify new proteins with desirable binding characteristics. Among the suitable "parentaP proteins which It specifi- 
cally identifies as useful for this purpose are three protelns-BPTI (58 residues), the third domain of ovomucoid (56 
residues), and cranrtbin (46 nesidues)-whloh are In the size range of 40-60 residues wherein noncovatent interactions 
30 between nonadjacent amino acids become significant; ail three also contain three disulfide bonds that enhance the 
stability of the moi^ie. 

[0036] Nowhere lri'W690/02B09 does one find any specific recognition that a polypeptide with less than 40 residues, 
■i and.especiaily those wtth^only on&drtwo disulfide bonds, would have sufficient stability to serve as a "scaffolding" for 
mutational variationf These '*micro-proteins" are, nonetheless, of great utility, as prsviousty indicated. 

39 [Ot)37] WOOO/02809 also suggests the use of a protein, azurin, having a different fonm of crosslink (Cu:CYS,HiS, 
HIS^MET). However, azurin has 128 amino acids, so it cannot possibly be considered a mini-protein. The present 
Jnventton relatesto the use of mini-proteins of less than 60 amino acids which feature a metal ion-coordinated crosslink. 

i¥f0038] By virtue of the present inventionj proteins are obtained which can bind specificalty to targets other than the 
Mantlgen-combinlng.sttes of antlbodles.^A protein is not to be considered a "binding protein" merely because It can be 

40 bound by an antibody (see definition of "binding protein" which follows). While almost any amino acid sequence of 
more than about 6-8 amino adds is likely, when linked to an immunogento carrier, to eitett an frnmune response, any 
ghren random polypeptide is unlikely to satisfy the stringent definition of '^binding protein" with respect to minimum 
affinity and specificity for its substrate, it is only by testing numerous random polypeptides simultaneously (and, in the 
usual case, controlling the extent and character of the sequence variation, U^, limiting It to resktues of a potential 

45 binding domain having a stable structure, the residues being chosen as more likely to affect binding than stability) tiiat 
this obstacle Is overcome. 

[0039] The af^>ended claims are hereby incorporated by reference into this specificatton as an enumeration of the 
prefemed embodiments. 

so BRIEF DESCfllPnON OF THE DRAWINGS 

[0040] 

Rgurel shows the main chain of scorpton toxin (Brookhaven Protein Data Bank entry 1 S|SJ3) residues 20 through 
ss 42. CYS^ and CYS4^ are shown forming a disulfide, in the native protein these groups form disulfides to other 
cysteines, but no nmin-chaln motion is required to bring the gamma sulphurs into acceptable geometry. Residues, 
other than GLY, are labeled at the p carbon with the one-letter code. 
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ngTAILED DESCRI 'ninM OFTHEPREFFRRED EMBODIMENTS 
I. INTRODUCTION 
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enriching population forthetmit.T7,epre8ertinv^tion^^^^ 

domains. Vn^T^^m of mlcro-proteins. V?„f,^LSSS aent^be on tha outer surface of a rep- 

binding properties by 1 > arranging that the ^^''"^'^'^'^^^ afflnltv selection - selection 

lioabie genetic padcage (GP) (acell. «P°';f "^^'^^^^J "^^^ 

forbindingtothetargetmaterial-to enrK*thepopuMon 0^^^ g^„^ 
proteins with improved binding to that ta^ "'^f Jl'^"i'i^rt7riroduce. The evolution is fomed' in that 

K'l.edl^^gy^fi'Btpertectedbyn^dHyingag^^^^^ 

E^^ss^Tirr^r^^^^^^^ 

;-rreSrchosen.^_a^^^^^^^ 

can be dispiayea on a Bur.a«. «. « F^—a- """T:;^^ "wariaaation" which after approprlfltecioning ana iimHi..«-u.-.. 
toaspecialpatlernofrnultipiemutagenesl^heretejwd-js vrtiichdlsSays a single potential binding 

steps leads to the prtxlucUon of a popuja^on^ 9^"f^ P^'^u^oflTrenttho^^^ 
doi.aln(amutantofthelPBD).butwhtohco«e^^dJ2"v^r^^^ 

binding domains (PBDs). Each gene^c P«^9eca"^^tt.e v^on^ ^^^^^ 
on the surface of that particular package. Aff'"**/ ^etections men usm ^ ^^^^ 
PBDS with the desired binding characteristic.. ^-^^^^^^^,,"^^^^^^^^9 <SBDs> 
cycles of enrichment by affinity selection and amplrfication. the DMA encoaingx 

may then be recovered from selected packages ^len be further "variegated", using an SBD of the 

binding domain" (PPBD). . j^ently crossiinked In the parental molecule 

IP04B1 When microi^roteins are variegated. tt,e J'"^ '^ ^^^"^^^ .iiegaBon of a disuifWe bonded micro- 

L left unchanged, thereby '^^^^''^^Z'^'^ Session a'nd display, covalent crosslinks te^ 

protein, certain cysteines are imranant so that und^thewn^^^^^ 

^, disulfide bonds between one or more pa^ otq^h^^^^ 

liter (preferably,* 10-7 molesfliter) „„t,hfldv"ln(1)aboveislntendedtomakadearthatforthepurposes 
T>,flexclusk>nof"vartebledomamofanantlbody"mO)ar)ow_Bu^ 

h^n% proWln is not to be considered a "binding protein- mere.yv^ 

pS Mostlarger ptoteinsfold Into '''^'''JS^l^.^tS^^J^t^^ overall structuro in theface 

Shen aligned, van^ at some codons so as toen^^^^^ 
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codon encodes, are determined In advance by the synthesizer of the DNA, even though the synthetic nnethod does 
not allow one to know, a priori, the sequence of any individual DNA molecule in the mixture. The number of designated 
variable codons in the variegated DNA preferably no more than 20 codons, and more preferably no more than 5-^1 0 
codons. The mix of amino acids encoded at each variable codon may differ from codon to codon. 
5 {0052] A population of genetic packages into which variegated DNA has been Introduced Is likewise said to be 'Var- 
iegated". 

[0053] For the purposes of this Invention, the term "potential binding protein" (PBP) refers to a protein encoded by 
one species of DNA molecule in a population of variegated DNA wherein the region of variation appears In one or more 
subsequences encoding one or more segments of the polypeptide having the potential of serving as a binding domain 

10 for the target substance. 

(00541 A "chimerk: protein'* is a fusion of a first amino add sequence (protein) wi^ a second amino acid sequence 
defining a domain foreign to and not sut>stantiaily homologous with any domain of the first protein. A chimeric protein 
may present a foreign domain whteh Is found (albeit in a different protein) In an organism which also e)qpresses the 
first protein, or It may be an "Interspecies", "Intergenerfc", etc. fusion of protein structures expr^ed by different kinds 

13 of organisms. 

[0055] One amino acid sequence of the chimeric proteins of the present invention is typk»lly derived from an outer 
surface protein of a "genetb package" (GP) as hereafter defined. One virhlch displays a PBO on Its surface is a GP 
(FBD). The second amino add sequence is one which, if expressed alone, would have the characteristics of a protein 
(or a domain thereof) but Is Incorporated into the chimeric protein as a recognizable domain thereof. It may appear at 
20 the amino or cartwxy terminal of the first amino acid sequence (with orwithoutan Intervening spacer), or it may interrupt 
the first amino add sequence. The first amino acid sequence may correspond exactly to a surface protein of the genetk: 
package, or tt may be modified, e^, to tediitate the display of the binding domain. 

11. MICRO* AND CmiER MINI-PROTEINS 

23 

[0056] In the present invention, disulfide bonded mtcro-protelns and metal-containing mini-protelns are used both 
as IPBDs In verifying a display strategy, and as PPBDs in actually seeking to obtain a BD wtth the desired target- 
binding characteristics. Unless othenvise stated or required by context, references herein to IPBDs should be taken 
to apply, mutatis mutandis , to PPBDs as well. 

30 [0057] For the purpose of the appended claims, a micro-protein has between about six and about forty residues; 
• micro-proteins are a subset of mini-proteins, which have less than about sixty residues. Since mtero-proteins forni a 
subset of mini-proteins, for convenience the term minl-proteins will be used on occasion to refer to both disuiflde- 
' tQl^onded mk^^roteins and metal-coordinated mini-proteins. 
[0058] The IPBD may be a mlni-protetn with a known binding activity, or one which, while not possessing a known 

35 binding activity, possesses a secondary or higher structure that lends Itself to binding activity (clefts, grooves, etc.) . 
J When the IPBD does have a known binding activity, it need not have any specific affinity for the target material. TTie 
f 4iPBD^need not-be^identicaf in'sequenc« toa natu mini-protein; It may be a "homologue" with an amino 

sequence which "^substanUally corresponds" to that of a known mini-protein, or it may be wholly artifteial. 
||itf0059] In determining whether sequences shoukl be deemed to "substantially correspond", one should consider the 

40 '.k folk>wlng Issues: the degree of sequence similarity when the sequences are aligned for best fit according to standard 
algorithms, the similarity In the connectivity patterns of any crosslinks (e.g. , disulfide bonds), the degree to vtrhich the 
proteins have similar three-dlmensk>nal structures, as Indicated by, e^. X-ray diffraction analysis or NMR, and the 
degree to whbh the sequenced proteins have stmiiar biologic acOvity. In this context, ft shouhi be noted that anrwng 
the serine protease inhP^ltors, there are families of proteins recognized to be homologous In which there are pairs of 

45 members wtth as little as 30% sequence honrralogy. 

[0080] A candidate IPBD should meet the follovi^ng criteria: 

1 ) a domain exists that will remain stable under the conditions of Its Intended use (the domain nrtay comprise the 
entire protein that will be Inserted, e^ a-conotoxin Gl (OLiV90a), or CMTI^Il (MCWH89), 
so 2) knowledge of the amino ackj sequence is obtainable, and 

3) a motile is obtainable having spedfto and high affinity for the IPBD, abbreviated AfM(IPBD). 

[0081] tf only one spedes of nr>olecule having affinity for IPBD (AfM(IPBD)) Is available, It will be used to: a) detect 
the IPBD on the GP surface, b) optimize expression level and densi^ of the affinity molecule on the matrix, and c) 
ss determine the efffciency and sensitivity of the affinity separation. One woukl prefer to have available two spedes of 
AfM(IPBD), one with high and one with moderate affinity for the IPBD. The species with high affinity wouM be used In 
initial detection and In detemnlning effteiency and senslth^tty, and the sp^es with moderate affinity would be used In 
optimization. 
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10062] If the IPBD is not Itself a known binding protein, or if Its native target has not been purified, an antibody raised 
against the IPBD may be used as the affinity molecule. Use of an antibody for this purpose should not be taken to 
mean that the antibody is the ultimate target 

[0063] There are many candidate iPBDs for which all of the above Inforniallon is available or is reasonably practical 
to obtain, for example. CMTI-lil (29 residues) (CMTI-type Inhibtters are described In OTLEBT, FAVE89 WlECaS. 
MCWHB9. BODE89. H0LAB9a,b). heat-«table enterotoxin (ST-la of Ecoli) (1 8 residues) (GUARB9, BHATB6, SEKI85. 
SHIIWIB7, TAKA85. TAKE90, THOMSSa.b, Y0SHB5. DALL90. DWAR89, QARI87. GUZM89, GUZM90. HOUQ84, 
KUB08&. KUPE90, OKAM87, OKAM88, AND OKAM90), o-ConotoxIn Gl (13 resWues) (HASH85, ALMQ89>, (i-Cono- 
toxln Gill (22 residues) (HIDO90). and Conus King Kong micro-protein (27 residues) (WOOD90). Structural Informatton 
can be obtained from X-ray or neutron diffractton studies, NMR. chemical cross linking or labeling, modeling from 
known structures of related proteins, or from theoretical cateulattons. 3D structural infonmatton obtained by X-ray dif- 
fraction, neutron diffraction or NMR Is preferred because these methods allow localization of almost all of the atoms 
to within d^ned limits. Table SO Usts several preferred IPBDs. 

[00641 Mutations may reduce the stability of the PBD. Hence the chosen IPBD should preferably have a high melting 
temperature, e.g.. at least 50»C. and preferably be stable over a wide pH range. B.g.. 8.0 to 3.0, but more preferably 
11 0 to 2 0. so that the SBDs derived from the chosen IPBD by mutation and selectfon-through-blnding will retain 
suffteient stabilily. Preferably, the substitutions in the IPBD yielding the various PBDs do not reduce the melting point 
of the domain betow »40»C. It will be appreciated that mini-proteins contain covalent crosslinks, such as one or more 
disuVides, are therefore are likely to be suffteiently stable. 

[0065] In vitro, disulfide bridges can form spontaneously In polypeptides as a result of air oxidation. Matters are nrore 
compHcated In vivo. Very few intracellular proteins have disulfide bridges, probably because a strong reducing envl- 

In nmtelnfi that travel or ooerate in 

intracellular spaces, such as snake venoms and othertoxlns (e^. conotoxins. oharybdotoxin, bacterial enterotoxlns) 
peptide homwnes. digestive enzymes, complement proteins. Immunoglobulins, lysozymes. protease Inhtoltors (BPTI 
and its homologues, CMTI-lll (Cucurblta maxima trypsin inhibitor II 1) and its homologues, hirudin, etej and milk proteire. 
[0066] Disulfide bonds that ctose tight intrachain loops have been found in pepsin, thioredoxin, insulin A-chain. silk 
fibroin, and lipoamide dehydrogenase. The bridged cysteine residues are separated by one to four residues along the 
polypeptide chain. Model building. X-ray diffraction analysis, and NMR studies have shown that the a carbon path of 
such toops Is usually flat and rigid. ^ , ^.^ „ „u.„ 

[0067] There are two types of disuffide bridges in immunoglobulins. One Is the consewed Intrachain bridge, spanning 
about 60 to 70 amino acid residues and found, repeatedly, in almost every immunoglobulin domain. Buried deep be- 
tween the opposing p sheets, these bridges are shielded from solvent and ordinarily can be reduced only in the presence 
of denaturing agents. The remaining disulfide bridges are mainly intenchain bonds and are located on the surface of 
the molecule; they are accessible to solvent and relatively easily reduced (STEIBS). The dlsulfkie bridges of the micro- 
proteins of the present invention are Intrachain linkages between cysteines having much smaller chain spacings. 
roora] When a mlcro-proteln contains a plurality of disulfide bonds, It is preferable that at least two cysteines be 
clustered, l.e., are Immediately adjacent along the chain (-C-C-) or are separated by a single amino acid (-C-X-C-). In 
either case, the two clustered cysteines beconw unable to pair with each other for steric reasons, and the number of 
realizable topologies is reduced. •^i. > 

[00691 An intrachain disuifWe bridge connecting amino acids 3 and 8 of a 1 6 residue polypeptide will be said herein 
to have a span of 4. If amino achls 4 and 12 are also disulfide bonded, then they f omi a second span of 7. Together, 
the four cysteines divide the polypeptide Into four Intercysteine stents (1 -2. 5-7. 9-11 . and 1 3-1 8). (Note ttis^ there 
Is no segment between Cys3 and Cys4.)The connectivity pattem of a crosslinked microi)roteln Is a smpte description 
of the relative location of the termini of the crosslinks. For example, for a mtero-protein with two disulfide bonds, the 
connectivity pattem °1-3, 2^" means that the first crosslinked cysteine Is disulfide bonded to ttie third crosslinked 
cysteine (In the primary sequence), and ttie second to the fourth . . , . , ^ ^ 

rOOTOl The degree to which the crosslink constrains the oonf omiatlonal freedom of the mim-protein . and the degree 
to whteh It stabilizes the mini-protein, may be assessed by a number of means, i nese include absoipiion spe(a.iiscspy 
(whteh can reveal whether an amino acid is buried or exposed), circular dtehroism studies (whteh provld^ a general 
picture of the helfcal content of the protein), nuclear magnetic resonance imaging (which reveals the nunriber of nuclei 
in a particular chemksal environment as well as the mobility of nuclei), and X-ray or neutron diffraction analysis of protein 
crystals. The stability of the mini-protein maybe ascertained by nwnltortngthe changes in absorption at various wave- 
lengths as a function of temperature. pH. ^; buried residues become exposed as the protein unfolds, similarly, me 
unfolding of the mlnli)rotein as a result of denaturing conditions results In changes In NMR line positions and widths. 
Circular dtehroism (CD) spectra are extremely sensitive to conformation. 

[0071] The variegated disutfide-bondedmicro-pratelns of the present invention fall mto several classes. 

I007M Class 1 m fcro-proteins are those featuring a single pair of cysteines capable of interacting to form a disufflde 

bond saM bond having a span of no more tiian about nine residues. TWs disulfide bridge preferabV has a span of at 
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least two residues; this is a function of the geometry of the disulfide bond. When the spacing Is two or three residues, 
one residue ts preferably glycine in order to reduce the strain on the bridged residues. The upper limit on spacing Is 
less precise, however, In general, the greater the spacing, the less the constraint on confonmation imposed on the 
linearly Intermediate amino add residues by the dtsulTide bond. 

s [0073] The main chain of such a p>eptid6 has very little freedom, but Is not stressed. The free energy released when 
the disulfide forms exceeds the free energy lost by the main-chain when locked into a conformation that brings the 
cysteines together, hiavlng lost the free energy of disulfide formation, the proximal ends of the side groups are held in 
more or less fixed relation to each other. When binding to a target, the domain does not need to expend free energy 
getting Into the conrec^ confomDatton. The domain can not Jump Into some other confomnatlon and bind a non-target. 

^0 [0074] A disulfide bridge with a span of 4 or 5 Is especially preferred. If the span Is increased to 6, the constraining 
Influence is reduced, in this case, we prefer that at least one of the enclosed residues be an amino that impose 
restrictions on the main-chain geometry. Proline imposes the most restriction. Valine and Isoleucine restrict the main 
chain to a lesser extent The preferred position for this constraining non-cysteine residue Is adjacent to one of the 
invariant cysteines, however, it nnay be one of the other bridged residues. If the span is seven, we prefer to Include 

^5 two amino acids that limit main-chain conformation. These amino acids could be at any of the seven positions, but are 
preferably the two bridged r^ldues that are irrimediatelyadjacentto the cysteines. If ti^^ additional 
constraining amino acids may be provided. 

[0075] While a class I micro-protein may have up to 40 amino acids, more preferably it is no nfK>re than 20 amino acicte. 
[0076] The disulfide bond of a dass I micro-proteins Is exposed to solvent. TTius, one usually should avoid exposing 
^ the variegated population of GPs that display dass t micro-proteins to reagents that aipture disulfides. 

[0077] Class li mtcro-proteins are those featuring a single disulfide bond having a span of greater than nine amino 
adds. The bridged amino adds fomi secondary structures which help to stabilize their confonration. Preferably, these 
intermediate amino adds fomi hairpin supersecondary stmctures such as those schematized below: 



25 
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4 [0078] Based on studies of known proteins, one may calculate the propensity of a particular residue, or of a particular 
'^'^ dipeptlde or trip6ptide,^to be found in an iz helix, ^ strand or reverse turn. The nonnalized frequendes of occurrence 
of the amino add residues in these secondary structures is given in Table 6-4 of CREiB4. For a more detailed treatment 

40 on the predidlon of s^:ondary structure from the amino acid sequence, see Chapter 6 of SCHUTg. 

[0079] In designing a suitable hairpin structure, one may copy an actual stnjcture from a protein whose three-dimen- 
sional confomiation Is known, design the structure using frequency data, or corrd>ine the two approaches. Preferably, 
one or more actual structures are used as a model, and the frequency data Is used to determine which mutations can 
be nrtade without disrupting the structure. 

^ [OCM] Preferably, no more than three amino adds lie between the cysteine and the beginning or end of the a helix 
or p strand. 

[0081] More complex structures (such as a double hairpin) are also possible. 

[0082] Class Ilia mlcro-protelns are those featuring two disulfide bonds. They optionally may also feature secondary 
structures such as those discussed above with regard to aass II micro-proteins. With two dteulfkle boncte, there are 
so . three possible topotogies; If desired, the number of realizable disuifkle bonding topologies may be reduced by dustering 
cysteines as In heat-stable enterotoxin ST-I& 

[0083] Class lltb micro-proteins are those featuring three or more disulfide bonds and preferably at least one duster 
of cysteines as previously described. 

[008^ ly^etal Rnger Mini - Proteins. The present Invention also relates to mini-proteins which are not crosslinked by 
35 disulfide bonds, e.g., cmaiogues of finger proteins. Rnger proteins are characterized by finger structures In which a 
metal ion Is coordinated by two Cys and two His residues, forming a tetrahedral a^angement around It. the metal Ion 
is most often zinc(ll), but may be iron, copper, cobalt, etc. The "finger" has the consensus sequence (Phe orTyr)-(1 
AA)-Cy8-{2-4 AA8)-Cys-{3 AAs)-Phe-(6 AAs)-Leu-(2 AAs)-His-{3 AA8)-His-(5 AAs) (BERGBS; G1BSB8). While finger 
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proteins typically contain many repeats of the finger motif, it is known that a single finger will fold in the presence of 
zinc Ions (FRAN87; PARR88). There is some dispute as to whether two fingers are necessary for binding to DMA. Ttie 
present invention encompasses mini-proteins with either one or two fingers. Other conr^binations of side groups can 
lead to f onmation of crossiinl<6 Involving multivalent metal Ions. Sununere (SUMM9 1 ), f or example, reports an 1 8-amino- 
? acid mini protein found in the capsid protein of HIV-1-F1 and having three cysteines and one histWIn© that bind a zinc 
atom, it Is to be understood that the target need not be a nucleic acid. 

G. Modified PBSs 

10 [0085] There exist a number of enzymes and chemical reagents that can selectively modify certain side groups of 
proteins, including: a) protein-tyrosine kinase, Elimans reagent, methyl transferases (that methylate GLU side groups) , 
serine kinases, proline hydroxyases, vitamin-K dependent enzymes that convert GLU to Gl-A, maleic anhydride, and 
alkylating agents. Treatment of the variegated population of GP(PBD)s with one of these enzymes or reagents will 
modify the side groups affected by the chosen enzyme or reagent. Enzymes and reagents that do not kill the GP are 

IS much preferred. Such modification of side groups can directly affect the binding properties of the displayed PBDs. 
Using affinity separation methods, we enrich for the modified GPs that bind the predetennined target. Since the active 
binding domain is not entirely genetically specified, we must repeat the post-morphogenesis modiffcation at each en- 
richment round. This approach is particularly apprxjpriate with mini-protein iPBDs because we envision chemical syn- 
thesis of these SBDs. 

20 

III. VARIEGATION STRATEGY - MUTAGENESIS TO OBTAIN POTENTIAL BINDING DOMAINS WITH DESIRED 
DIVERSmr 

III.A. Generalty 

25 

[0086] When the number of different amino acid sequences obtainable by mutation of the domain is large when 
compared to the number of different domains which are displayable in detectable amounts, the efficiency of the forced 
evolution is greatly enhanced by careful choice of whteh residues are to be varied. First, residues of a known protein 
whteh are likely to affect its binding activity (e.g. , surface residues) and not likely to unduly degrade Its stability are 

30 identified. Then all or some of the codons encoding these residues are varied simultaneously to produce a variegated 
population of DNA. Groups of surface residues that are close enough together on the surface to touch one molecule 
of target sffnultaneously are preferred sets for simultaneous variegation. The variegated population of DNA is used to 
express a variety of potential binding domains, whose ability to bind the target of interest may then be evaluated. 
[0087] The method of the present invention Is thus further distinguished from other methods in the nature of the 

35 highly variegated population that is produced and from which novel binding proteins are selected. We force the dis^ 
played potential binding domain to sample the nearby "sequence space" of related amino-acid sequences in an effteient, 
organized manner. Four goals guide the various variegation plans used herein, preferably: 1 ) a very large number (e^ 
g. 1 0^ of variants is available, 2) a very high percentage of the possible variants actually appears in detectable amounts, 
3) the frequency of appearance of the desired variants is relatively uniform, and 4) variatfon occurs only at a limited 

40 number of amino-acid residues, most preferably at residues having side groups directed toward a common region on 
the surface of the potential binding domain. 

[0088] This Is to be distinguished from the simple use of Indiscriminate mutagenk: agents such as radiation and 
hydroxylamlne to modify a gene, where there is no (or very oblique) control over the site of mutation. Many of the 
mutations will affect residues that are not a part of the binding domain. When chemical mutagens are directed toward 

45 the whole genome, wosX mutations occur in genes otherthan the one encoding the potential binding domain. Moreover, 
since at a reasonable level of mutagenesis, any modified codon is likely to be characterized by a single base change, 
only a limited and biased range of possibilities will be explored. Equally remote is the use of site-specific mutagenesis 
techniques flmpjoying mutagenic oligonucleotides of nonrandomized sequence, since these techniques do not lend 
themselves to the production and testing of a large number of variants. While focused random mutagenesis techniques 

so are known, the Importance of controlling the distribution of variation has been largely overlooked. ^ ^ 

[0089] The potential binding donrtains are first designed at the amino acid level. Once we have Identified which res- 
idues are to be mutagenized. and which mutattons to allow at those positions, we may then design the variegated DNA 
which is to encode the vartous PBDs so as to assure that there Is a reasonable probability that If a PBQ has an affinity * ^ 

for the target, it will be detetded. Of course, the number of independent transfonnants obtained and the sensitivity of 

ss the affinity separation technology will impose limits on the extent of variegation possible within any single round of 
variegation. 

[0090] There are many ways to generate diversity in a protein. (See RICH86, CARU85, and OLIPBQ.) Atone extreme, 
we vary a few residues of the protein as much as possible (inter alia see CARU85, CARU87, RICHB6, and WHARB6). 
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We will call this approach "Focused Mutagenesis". A typical "Focused (Mutagenesis" strategy is to pick a set of five to 
seven residues and vary each through 13-20 possibilities. An attematlve plan of mutagenesis ("Diffuse Mutagenesis") 
ts to vary many mora residues through a more limited set of choices (See VERSdSa and PAKU66). The variegation 
pattern adopted nnay fail between these extremes, e.g. , two residues varied through all twenty amino acids, two more 

5 through only two possibilities, and a fifth into ten of the twenty amino acids. 

pOdI] There Is no fixed limit on the number of codons which can be mutated simultaneously. However, it Is desirable 
to adopt a mutagenesis strategy which results In a reasonable prob^ility that a possible PBD sequence is in fact 
displayed by at least one genetic padcage. Preferably, the probability that a mutein encoded by the vgDNA and conn- 
posed of the least favored amino acids at each variegated position will be displayed by at least one independent 

10 transformant in the library is at least 0.50, and more preferably at least 0.90. (Muteins composed of more favored amino 
acids would of course be more lilcely to occur in the same Ibrary.) 

[0092] Preferably, the variegation is such as will cause a typical transfomiant population to display 1 0^-1 0^ different 
amino acid sequences by means of preferably not more than 10-fold more (more preferably not more than 3-fold) 
different DNA sequences. 

13 [0093] For a Class I micro-protein that lacks a helices and P strands, one will, In any given round of mutation, pref- 
erably variegate each of 4-8 non-cystelne codons so that they each encode at least eight of the 20 possible amino 
adds. The variegation at each codon could be customized to that position . Preferably, cysteine ts not one of the potentiaJ 
substitutions, though It is not excluded. 

[0094] When the mini-protein is a metal finger protein, in a typical variegation strategy, the two Cys and two His 
20 residues, and optionally also the aforementioned Phe/Tyr, Phe and Leu residues, are held invariant and a plurality 
(usually 5-1 0) of the other residues aro varied. 

[0095] When the mlcro-proteln Is o1 the type featuring one or more a helices and ^ strands, the set of potential amino 
acid modifications at any given position is picked to favor those which are less likely to disrupt the secondary structure 
at that position. Since the number of possibilities at each variable amino add more limited, the total number of variable 
25 amino ackis may be greater without altering the sampling efficiency of the selectton process. 

[0096] For class III micro-proteins, pnsferably not more than 20 and nrtore preferably 5-1 0 codons will be variegated. 
However, if diffuse mutagenesis \s employed, the number of codons which are variegated can be higher. 
[0097] While variegation nomnally will Involve the substitution of one amino add for another at a designated variable 
codon, it may Involve the Insertion or deletion of amino acids as well. 

30 

. Ili.B: Identification of Residues to be \faried 

'* [0098] We now'consider the principles that guide our choice of residues of the IPBD to vary. A key concept is that 
only structured proteins exhibit spedftc binding. Le. can bind to a particular ch^ical entity to the exduslon of most 
33 ^ others. Thus the residues to be varied are chosen with an eye to preserving the underlying IPBD structure. Substitutions 
that prevent the PBD from folding will cause GPs carrying those genes to bind indiscriminately so that they can easily 
t ?be removed"from'the population. Substitutions of amino adds that are exposed to solvent are less likely to aff^ the 
3D structure than^^ substitutions at Internal k3d. (See PAKU86. RElDBSa, EISE85. SCHU79. p169-171 andCRE184, 
''P239-245. 31 4^1 5). ^Internal resklues are frequently conserved and the amino add type cannot be changed to a 
40 significantly different type without substantial risk that the protein structuro will be disrupted. Nevertheless, some con- 
servative changes of internal residues, such as I to L or F to Y, are tolerated. Such conservative changes subtly affect 
the placement and dynamk:s of adjacent protein residues and such "fine tuning" may be useful once an SBD is found. 
Insertions and deletions are more readily tolerated in loops than elsewhere. (THOR88). 

[0099] Data about the IPBD and the target that are useful in d^ding which residues to vary In the variegation cyde 
45 include; 1) 3D structure, or at least a list of rosidues on the surface of the IPBD. 2) list of sequences honwiogous to 
IPBD, and 3) model of the target molecule or a stand-In for the target 

iil.C. Determining the Substitution Set for Each Parental ReskJue 

30 [0100] Having ptoked which residues to vary, we now dedde the range of amino adds to altow at each variable 
residue. The total level of variegation Is the product of the number of variants at each varied residue. Each varied 
residue can have a different scheme of variegation, produdng 2 to 20 different possibilities. The set of amino ackis 
whbh are potentially encoded by a given variegated codon are called Its "substitution seT. 
[0101] The computer that control a DNA synthesizer, sudi as the Mlliigen 7500, can be programmed to synthesize 

33 any base of an oligo-nt with any distribution of nts by taking sonrte nt substrates (e.g. nt phoaphoramldites) from each 
of two or more reservoirs. Alternatively, nt substrates can be mixed In any ratios and placed in one of the extra reservoir 
for 80 called "dirty bottle" synthesis. Each codon could be progranrmed differentiy. The "mix" of bases at each nudeotide 
position of the codon determines the relative frequency of occurrence of the different amino adds encoded by that 
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codon. u* I «^ 

[0102] Simply variegated codons are those In which those nucleotide positions which are degenerate are obtained 
from a mixture of two or more bases mixed in equimolar proportions. These mixtures are described in this speciftcation 
by means of the standardized "ambiguous nucleotide- code. In this code, for example, In the degenerate codon '•S^f^, 
5 "S- denotes an equimolar mixture of bases G and C. "N". an equimolar mbcture of all four bases, and "T. the single 
invariant base thymidine. 

pi 03] Complexly variegated codons are those in which at least one of the three positions is filled by a base from 
an other than equimolar mixture of two of more bases. 

[01 04] Either simply or complexly variegated codons may be used to achieve the desired substitution set. 

10 [0105] If we have no infonnatlon indicating that a particular amino acid or dass of amino add Is appropnate, we 
strive to substitute ail amino acids with equal probability because representation of one mini-protein above the detect- 
able level is wasteful. Equal amounts of all four nts at each position in a codon (NNN) yields the amino acid distribution 
in which each amino acid Is present in proportion to the number of codons that code for it This distribution has the 
disadvantage of giving two basic residues for every acidic residue, in addition, six times as much R. S, and L as W or 

IS M occur, if five codons are synthesized with this distribution, each of the 243 sequences encoding some combination 
of L, R, and S are 7776-times more abundant than each of the 32 sequences encoding some combination of W and 
M. To have five Ws present at detectable levels, we must have each of the (L,R.S> sequences present in 7776-fold 
excess. 

[0106] Particular amino acid residues can influence the tertiary structure of a defined polypeptide in several ways. 
20 including by: 

a) affecting the ftexiblllty of the polypeptide main chain, 

b) adding hydrophobic groups, 
c> adding charged groups, 

25 d> allowing hydrogen bonds, and 

e) forming cross-links, such as disulfides, chelation to metal Ions, or bonding to prosthetic groups. 

[01071 Lundeen (LUNDSfi) has tabulated the frequencies of amino adds in helices, p strands, tums. and coil in 
proteins of known 3D structure and has distinguished between CYSs having free thiol groups and half cystines. He 
30 reports that free CYS is found most often In helixes while half cystines are found more often In p sheets. Half cystines 
are, however, regulariy found in helices. Pease et aL {PEAS90) constructed a peptide having two cystines; one end 
of each is In a very stable a helix. Apamin has a similar structure (WEMM83, PEAS88). 

Flexibility: 

35 

[0108] GLY Is the smallest amino acid, having two hydrogens attached to the C„. Because GLY has no Cp, it confers 
the most flexibHity on the main chain. Thus GLY occurs very frequently in revefse turns, partlcuiarly In conjunction with 
PRO.ASP.ASN.SER.andTHR. 

1101091 Hie amino acids ALA. SER. CYS. ASP. ASN, LEU. MET. PHE, TYR, TRP. ARG. HIS. GLU. GLN, and LYS 
40 have unbianched ^ carbons. Of these, the side groups of SER. ASP, and ASN frequently make hydrogen bonds to tte 
main chain and so can take on main-chain conformations that are energetkjaily unfavorable for the others. VAL. ILE, 
and THR have branched p carbons which makes the extended main-chain confomiatfon more favorable. Thus VAL 
and ILE are most often seen in ^ sheets. Because the side group of THR can easily fonn hydrogen bonds to the main 
chain, it has less tendency to exist in a p sheet. . , ,. , . 

45 pMIOl The main chain of proline is partkaiiarty constrained by the cyclic skte group. The ^ angle is always close to 
-60°. Most prolines are found near the surface of the protein. 

Charge: 

50 [0111] LYS and ARG cany a single positive charge at any pH below 10.4 or 12.0. respectively. Nevertheless, the 
methylene groups, four and three respectiveiy, of these amino acids are capable of hydrophobte Interactions. Ihe 
guanidlnlum group of ARG Is capable of donating five hydrogens shiuitaneously. while the amino group of LYS can 
donate onV three. Furthermore, the geometries of these groups Is quite different, so that these groups are often not 

interchangeable. ^ b Aeoi. = 

S5 [01 121 ASP and GLU cany a single negative charge at any pH above «4.5 and 4.6. respectively. Because ASP has 
but one methylene group, few hydrophobte interactions are possible. The geometry of ASP lends itself to fomiing 
hydrogen bonds to main-chain nitrogens which is consistent with ASP being found very often in reverse turns and at 
the beginning of helices. GLU is more often found in a heltees and partteulaily in the amino-terminal porton of these 
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helices because the negative charge of the side group has a stabilizing Interactbn with the helix dipole (NICHB8, 
SALI8B). 

[0113] HIS has an ionization pK In the physiological range, yiz^ 6.2. This pK can be altered by the proximity of charged 
groups or of hydrogen donators or acceptors. HIS is capable of forming bonds to metal Ions such as ztnc, copper, and 
3 iron. 

Hydrogen bonds: 

-[01 1 4] Aside from the charged amino acl<te, SER, THR, ASN. GLN, TYR, and TRP can partldpate in hydrogen boncte. 

10 

Cross links: 

[01 15] The most important form of cross Wnk Is the disulfide bond fomied between the thiols of CYS residues. In a 
suitably oxidizing environment, these boncfe fomi spontaneously. Th^e bonds can greatly stabilize a particular con- 
IS formation of a protein or mini-protein. When a mixture of oxidized and reduced thiol reagents are present, exchange 
reactions take place that allow the most stable conformation to predominate. Concerning disulfides In proteins and 
peptides, see also KATZdO, MATSB9, PERR&4, PERR86, SAUE86. WELL86, JANA89, HORV89, KiSHBS, and 
SCHN86. 

[01 1 6] Other cross links that form without need of specific enzymes include: 

20 



1){CYS)4:Fe 


Rubredoxin (In CREI84, P.376) 


2) (CYS)4^n 


Aspartate Transcarbamylase (In CREI84, P.376) and Zn-fingers (HARD90) 


3) (HIS)2(MET)(CYS):Cu 


Azurin (in CREI84. R376) and Basic "Blue" Cu Cucumber protein (GUSS88) 


4) (HIS)4:Cu 


CuZn superoxide dismutase 


5)(CYS)4:(Fe4S4) 


Ferredoxin (in CREI84, P.376) 


6) (CYS)2(HIS)2:Zn 


Zinc-fingers (GIBS88, SUMM91) 


7) (CYS)3(HiS):Zn 


Zinc-fingers (GAUS87. GIBS88) 



30 Cross links having (HIS)2(^ET) (CYS):Cu has the potential advantage that HIS and MET can not form other cross 
links without Cu. 

-Slmpty Variegated Codons 

35 [0117] The following simply variegated codons are useful because they encode a relatively balanced set of amino 
acids: 

-vt) SNT whlch^ncodes^the set [L,P,H,R,V,A,D,G]: a) one acidic (D) and one basic (R), b) both allphatk: (L,V) and 
-^<«t^aromatic hydrophoblcs:(H). c) large (L,R.H) and small (G,A) side groups, d) rigid (P) and flexible (G) amino adds, 
- ^} e) each amino.acid encoded once. 

2) RNG which encodes the set [M,TK,R,V,A,E,G]: a) one acidic and two basic (not optimal, but acceptable), b) 
hydrophlllcs and hydrophobics, c) each amino acki encoded once. 

3) RMG which encodes the set [T,K,A,E] : a) one ackfic, one basic, one neularal hydrophirtc, b) three favor a helices, 
c) each amino add encoded once. 

4) Vm Yihich encodes the set [L,P,H,R,I,T,N.S,V,A,D,G]: a) one acidic, one basic, b) ail dasses: charged, neutral 
hydrophilic. hydrophobfe. rigid and flexible, ete.. c) each amino acid encoded once. 

5) RRS whteh encodes the set [N.S.K.R.D.E.G^J: a) two addles, two basics, b) two neutral hydrophillcs. c) only 
glycine encoded twice. 

6) NNTwhteh encodes the set [F.S,Y,C,L.P.H.R,I,T,N,V,A-,D.G]: a) sixteen DMA sequences provide fifteen different 
amino adds; only serine is repeated, all others are present in equal amounts (Th@ allows very efRdent sampling 
of the Itorary.), b) there are equal nunr^ere of addic and baste amino adds (D and R, once each), c) all major 
dasses of amino adds are present addle, baste, aliphatic hydrophobte. aromatte hydrophobte, and neutral hy- 
drophilte. 

7) NNG, which encodes the set [L^, R2.S,W,P,Q,M,TK.V.A.-E,G. stop]: a) fair preponderance of reskJues that favor 
fonnatten of a-heltees [L.M.A.Q.K.E; and, to a lesser extent, S.R.Tl; b) encodes 13 different amino acids. (VHG 
encodes a sut>set of the set encoded by NNG which encodes 9 amino adds In nine different DNA sequences, with 
equal adds and bases, and 5/9 being a helix-favoring.) 
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[01 181 For the inMal vart9gation, NNT Is preferred, in most cases. However, when the codon is encoding an amino 

acid to be incorporated Into an a heHx.NNG is preferred. 

[0119] Below, we analyze several simple variegations as to the efnclency with w''"''\*l!^'?"^Ji^ ^^^t 
[0120] Ubraries of random hsxapoptides encoded by (NNIQB have been reported (SCOTOO ONmooy Table 30 
shows the expected behavior of such libraries. NNK produces single codons for PHE. TYR, CYS. TRP. H S. eiJN JLE. 
IVIET, ASN, LYS. ASP. and QLU (o set); two codons for each of VAL. ALA, PRO, THR. and QLY «I> set) ; and three 
codons for each of LEU. ARQ. and SER (Q set). We have separated the 64,000.000 possible sequences Into 28 
classes, shown in Table 130A. based on the number of amino acids from each of these sots. The largest class is 
<I>Qaocaa with -14.6% of the possible sequences. Aside from any setecUon. all the sequences In one dass have tt^e 
same probability of being produced. Table 130B shows the probability that a given DNA sequence taken from the 
(NN1C)6 library will encode a hexapoptide belonging to one of the defined classes; note that onV =6.3% of DNA se- 
quences belong to the OQoaoa class. 

[0121] TablelSOCshowstheexpectednumbemofsequanceslneachciassforltoranescontalningvar^^^^ 

ifindependenttransfomiantB{vb,10«.3.10«.10^3.107.108,3.10B.109,and3.109).At10^ 

(rr8>.WGaxpecttosee56%oftheflQaS2QQclas8.butonly0.1%oftheaaiwom 

seen come from classes forwhich less than 10% of the dass is sampled. Suppose a peptide from, for example, ctess 
99QOaa Is isolated by fractionating the library for binding to a target. Consider how much w© know aboi* pepUdes 
that are related to the isolated sequence. Because only 4% of the «<«2Qaa class was sampled, we can not corwtoide 
that the amino adds from the Q set are in fact the best from the Q set We might have LEUat position 2. but ARG or 
SER could be better. Even if we isolate a peptWe of the QQQQQQ dass. there Is a noticeable chance that better 
members of the class were not present in the library. , ^ k * 

r0122l With a library of 10? FTs, we see that several classes have been completely sampled, but that the aaaaaa 

dass Is only 1 .1% sampled. At 7.6-10'^ rrs. we expect aepiay oi ou-/» «. «.. p......--— --^ -. ™ -—"g 

containing three or moreamino adds of theoset are still poorly sampled. Toadiieve complete samplingof the (NNIQ^ 

library requires about 3-109 ITS. 104oldlargerthan the largest (NNK)P library so far reported. 

S Table 131 shows expectations for a library encoded by (Nf^NNQ)^. The expectations of abundance are 

ndepJndent of the order of the codons or of intempemed umraried codons. This library enc^^ 

aZo-add sequences, but there are only 0.01 65 Umes as many DNA ^fj^^^'^t h^l^lfeSiL^^^?^^^^ 
than required for (NNK)=) gives almost complete sampling of the library. The results woukl be slightly betterfor (NNH 
and sShtV, but not much woree for (NNG)*. The controlling factor Is the ratio of DNA sequences to amlno^d se- 

pmrTable 132 shows the ratto of #DNA saquences«>AA sequences for codons NNK. NNT. and NNG. For NNK 
and NNG. we have assumed that the PBD is displayed as part of an essential gene, sudi as gene Jl n Ff phage,^ 
teindL;lbythephrase■assumingstopsvanlsh^ltisnotinanywayrequlre^ 

If a non-ess^al gene is used, the analysis would be slightly different; sampling of NNI< and NNG would be si ghtiy 
eL efndent. Note'that {NNT)S gh^es 3.6-fold more amino-add sequences than (NNJ^ but ^^^^^^ 
DNAsequenees. Note also that (NNT)' gives ttirfoe as many amino^d sequences as (NNK)6. but 3.3-fold fewer DNA 

S'^^us. While ft is possible to use a simple mbcture (NNS, NNK or NNN) to obtain at a P^rtk^la^position aH 
S ammo kdds, these simple mixtures lead to a highly biased set of encoded amino adds. This problem can be 
overcome by use of complexly variegated codons. 

Complexly Variegated Codons 

r01261 The nt distribution (-fxS"> within the codon that allows ail twenty amino acids and that yields the laig^ ratio 
Kndance ofSe least favored amino add (Ifaa) to that of the most favored amino acid (-^J-). -^«^to ttie 
«,JSraints of equal abundances of addic and basic amino adds, least possible number of ^^^iT^l 
!:!!!!!!;"»^ tK„ y>.^M K«,a x or G. shown m Table 10A and yields DNA molecules encoding eadi type of 
^;;;";;;;S;;i^,Th;'a^u;'^n;es s^hown. W codons are obtamable by relaxing one ormore 

pilTtotethatthischemlstnrencodesalltwentyamlno acids, wlthacldlc 

and the most favored amino add (serine) is encoded only 2.454 times as often as the 'f f ^^'^f ^^"f,^ ^JJT 

tophan) The IxS" vg codon improves sampling most for peptides containing several of the amino 

q5!1S,1<.D Eiitor whldi NNlJorNNS provide only one codon. Ha sampling advantages are most pronounced when 

p^S"''T?eTr^«gthere^^ 

'-nfe advantages of an NNT codon are discussed elsewhere in the present application. Unoptimized NNT 
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provides 16 amino adds encoded by only 1 6 DMA sequences, it is possible to improve on NNT with the distribution 
shown in Table 1 0C. which gives five amino acids (SER, LEU. HIS, VAL. ASP) in very nearly equal amounts. A further 
eight amino adds (PHE. TYR. ILE, ASN. PRO, ALA, ARG, GLY) are present at 78% the abundance of SER. THR and 
CYS remain at haif the abundance of SER. When variegating DNA for disutfide-bonded micro-proteins, it Is often 
s desirable to reduce the prevalence of CYS. This distribution allows 1 3 amino adds to be seen at high level and gives 
no stops; the optimized fxS dtetribution allows only 11 amino adcte at high prevalence. 

[0130] The NNG codon can also be optimized. Table 10D shows an approximately optimized ([ALA] » [ARGD NNG 
codon. There are, under this variegation, four equally most favored amino adds: LEU, ARG, ALA, and GLU. Note that 
ithere is one acidic and one basic amino add In this set. There are two equally least favored amino adds: TRP and 
10 MET. The ratio of tfaa/nrrfaa is 0.5258. If this codon Is repeated six times, peptides composed entirely of TRP and IVIET 
are 2% as conrvnon as peptides composed entirety of the most favored amino acids. We refer to th^ as 'the prevalence 
of (TRP/MEr)e in optimized NNG^ vgDNA". 

[01 31 ] When synthesizing vgONA by the ''dirty botde" method, it Is sometimes desirable to use only a limited number 
of mixes. One very useful mixture is called the "optimized NNS mtxture" in which we average the first two positions of 
IS the fxS mbcture: T^ = 0.24, ^ 0.17, Ai = 0.33, G^ = 0.26, the second position is identteal to the first, 03^=03=^ 0.5. 
This distribution provides the amino adds ARG, SER, LEU, GLY. VAU THR. ASN, and LYS at greater than 5% plus 
ALA. ASP, GLU, ILE. MET, and TYR af greater than 4%. 

[0132] An additional complexly variegated codon of interest This codon is identical to the optimized NNT codon 
at the first two positions and has T:G::90:1 0 at the third position. This codon provides thirteen amino adds (ALA, ILE, 

20 ARG, SER, ASP, LEU. VAL, PHE, ASN, GLY, PRO, TYR, and HIS) at more than 5.5%. THR at 4.3% and CYS at 3.9% 
are more common than the LFAAs of NNK (3.1 25%). The remaining five amino adds £u^ present at less than 1 %. This 
codon has the feature that ail amino adds are present; sequences having more than two of the low-abundance amino 
adds are rare. When we Isolate en SBO using this codon, we can be reasonably sure that the first 13 amino acids 
were tested at each position. A similar codon, based on optimized NNG, could be used. 

25 [01 33] Table 1 0E shows some properties of an unoptimized NNS (or NNK) codon . Note that there are three equally 
most-favored amino acids: ARG, LEU, and SER. There are also twelve equally least favored amino acids: PHE, ILE, 
MET, TYR, HIS, GLN, ASN, LYS, ASP. GLU. CYS, and TRP. Rve amino adds (PRO. THR, ALA, VAL, GLY) fall in 
between. Note that a six-fold repetition of NNS gives sequences composed of the amino adds [PHE, ILE, MET, TYR, 
HIS. GLN. ASN. LYS. ASP. GLU, CYS. and TRP] at only «0.1% of the sequences composed of [ARG. LEU, and SER]. 

30 Not only Is this »20-f old lower than the prevalence of (TRP/M ET)® in optimized NNG^ vgDN A, but this low prevalence 
applies to. twelve amino acids. 

^^^Iff use^Ilfiutagenesls 

33 [01 34]^ Diffuse Mutagenesis can be applied to any part of the protein at any time, but is most appropriate when some 
binding to the target has been established. Diffuse Mutagenesis can be accomplished by spiking each of the pure nts 
' -^ac^ated for DNA synthesis (e.g. nt-phosphoramidites) with a small amount of one or more of the other activated nts. 
^^^Preferably, the level of spiking is set so that only a small percentage (1% to .00001%. for example) of the final product 
:f.vyWill contain the initial DNA sequence. This will insure that many single, double, triple, and higher mutations occur, but 

40 . that recovery of the basic sequence will be a possible outcome. 

lil.D. Special Considerations Relating to Variegation of Micro-Proteins with Essential Cysteines 

[0135] Several of the prefenred simple or complex variegated codons encode a set of amino adds which Indud^ 
^ cysteine. This means that some of the encoded binding domains will feature one or more cysteines in addition to the 
Invariant dteutfide^nded cysteines. For example, at each NNT-encoded position, there is a one in sixteen chance of 
obtaining cysteine. If six codons are so veuled. the fraction of domains containing additional cysteine Is 0.33. Odd 
numbers of cysteines can lead to complications, see Peny and Wetzel (PERR84). On the other hand, many dlsuifide- 
containing proteins contain cysteines that do not form disulfides, e.g. trypsin. The possibility of unpaired cysteines can 
so be dealt with In several ways: 

Rrst, the variegated phage population can be passed over an immobilized reagent that strongly binds free thiols, 
such as SulfoUnk (catalogue number 44895 H from Pierce Chemical Company, Rockford, Illinois, 61105). Another 
produd from Pierce Is TNB-Thtd Agarose (Catalogue Code 20409 H). BbRad sells Affi-Gel 401 (catatogue 
ss 153-4599) for this purpose. 

Second, one can use a variegation that exdudes cysteines, such as: 

NHTthat gives F,S.Y,L,P,H,I.T,N,VAD], 
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VNSthatglvds 

[L2p2HAR3j.M.T2.N,K.S.V2.A2E,D,G2]. 
NNG that gives [L^ S,W,P,Q,R2,M,T.K,R,V.A,E,G,stop], 
S^^■that gives [L,P,H,R.V^,D.GL 
RNG that gives [M,T,K,R,VAE,G1, 
RMGthat gives [r.KAEl, 
VNT that gives [L,P,H.R.I,T.N.S,V,A,D.GJ. or 
RRS that gives [N.S,K,R,D,E,Q2]. 



10 However, each of these schemes has one or more of the disadvantages, relative to NNT: a) fewer amino acids are 
allowed, b) amino acids are not evenly provided, c) acidic and basic amino acids are not equaily likely), or d) stop 
codons occur. Nonetheless, NNG, NITT, and VNT are aimost as useful as NNT. NNG encodes 1 3 different amino adds 
and one stop signal. Only two amino acids appear twice in the 16-fotdmlx. 

[01 36] Thirdly, one can enrich the population for binding to the pnesel^ed target, and evaluate selected sequences 
IS post hoc for extra cysteines. Those that contain more cysteines than the cysteines provided for confomiationai con- 
straint may be perfectly usable. It is possible that a disulfide linkage other than the designed one wlli occur. This does 
not mean that the binding domain defined by the isolated DNA sequence is In any way unsuitable. The suitability of 
the isolated domains is best detemnined by chemical and biochemk:al evaluation of chemically synthesized peptides. 
[0137] Lastly, one can block free thiols with reagents, such as Eilman's reagent, iodoacetate, or methyl iodide, that 
30 speclflcaity bind free thiols and that do not reac^ with disuifides, and then leave the modified phage in the population. 
It is to be understood that the blocking agent mny alter the binding pmparHes of the mtcro^proteln: thus, one might use 
a variety of blocking reagent In expectation that different binding domains will be found. The variegated population of 
thiol-blocked genetic packages are fractionated for binding, if the DNA sequence of the isolated binding micro-protein 
contains an odd number of cysteines, then synthetic means are used to prepare mtoro-proteins having each possible 
2s linkage and in which the odd thiol is appropriately blocked. Nishiuchi (NISH82, NISH86, and works cited therein) dis- 
close methods of synthesizing p^ttdes that contain a plurality of cysteines so that each thiol is protected with a different 
type of blocking group. These groups can be selectively removed so that the disulfide pairing can be controlled. We 
envision using such a scheme with the alteration that one thiol either remains blocked, or Is unblocked and then re- 
blocked with a different reagent. 

30 

lil.E. Planning the Second and Later Rounds of Variegation 



[0138] The method of the present Invention allows efficient accumulatton of Infomnatlon concerning the amino-acid 
sequence of a binding domain having high affinity for a predetermined target. Although one may obtain a highly useful 
35 binding domain from a single round of variegation and afTinity enrichment, we expect that muttqsle rounds will be needed 
to achieve the highest possible affinity and specificity. 

[0139] If the first round of variegation results in some binding to the taiget, but the affinity for the target is still too 
low, further irhprovenr^nt may be achieved by variegation of the SBDs. Preferably, the process is progressive, Le^ each 
variegation cyde produces a better starting point for the next variegation cycle than the prevk>us cycle produced. 
40 Setting the level of variegatton such that the ppbd and many sequences related to the ppbd sequence are present In 
detectable amounts ensures that the process is progressive. 

[0140] If the level of variegation is so high that the ppbd sequence Is present at such low levels that there is an 
appreciable chance that no transformant vnW display the PPBD, then the best SBD of the next round could be worse 
than the PPBD. At excessively high level of variegation, each round of mutagenesis Is independent of previous rounds 
45 and there is no assurance of progresslvtty. This approach can lead to valuable binding proteins, but repetition of ex- 
periments with this level of variegation wilt not yield progressive results. Excessive variation is not preferred. 
[0141] Progressivity is not an all-or-nothing property. So long as most of the information obtained from previous 
variegation cycles is retained and many different surfaces that are related to the PPBD surface are produced, the 
process is progressive. 

so [01 42] if the level of variegation in the previous variegation cyde was oon^ectly chosen, then the amino adds selected 
to be in the residues just varied are the ones best determined. The environment of other resklues has changed, so 
that it is appropriate to vary them again. Because there are often more residues of interest than can be varied simul- 
taneously, we may continue by poking residues that either have never been varied (highest priority) or that have not 
been varied for one or more cycles. 

ss [0143] Use of NMT or NNG variegated codons leads to very effident sampling of variegated libraries because the 
ratio of (different amino-acid sequences)/(dlfferent DNA sequences) is much doser to unity than it is for NNK or even 
the optimize vg codon (fxS). Nevertheless, a few amino acids are omitted in each case. Both NNT and NNG allow 
members of all important classes of amino acids: hydrophobic, hydrophilic. acidic, basic, neutral hydrophitic, small. 
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and large. After selecting a binding domain, a subsequent variegation and selection may be desirable to achieve a 
higher affinity or specificity. During this second variagationp amino acid possibilities overlooked by the preceding var- 
iegation may be investigated. 

[0144] A few examples may be helpful. Suppose we obtained PRO using Nh4T. This amino acid is available with 
5 either NNT or NNG. We can be reasonably sure that PRO is the best amino acid from the set [PRO, LEU, VAL, THR, 
ALA, ARG, GLY, PHE, TYR, CYS. HIS, ILE. ASiSI. ASP, SER]. We next might try a set that Includes [PRO, TRP, GLN. 
MET, LYS, GLU]. The set allowed by NNG is the prefenred set. 

[0145] What tf we obtained HIS instead? Hlstidlne is aromatic and fairly hydrophobic and can fomn hydrogen bonds 
to end from the imidazole ring. Tryptophan is hydrophobic and aromatic and can donate a hydrogen to a suitable 

10 acceptor and was excluded by the NiMTcodon. Methionine was atso excluded and is hydrophobic. Thus, one prefeoed 
course is to use the variegated codon HDS that allows [HIS, GUM, ASN, LYS, TYR, CYS, TRP, ARG. SER, GLY, <8top>]. 
[0146] If the first round of variegation is entirety unsuccessful, a different pattem of variegation should be used. For 
example, if more than one interaction set can be defined within a domain, the residues varied in the next round of 
variegation should t>e from a different set than that probed in the Initial variegation. If repeated f^lures are encountered, 

IS one may switch to a different IPBD. 

IV. DISPLAY STRATEGY: DISPLAYING FOREIGN BINDING DOMAINS ON THE SURRICE OF A "GENEHC 
PACKAGE" 

20 IVA. General Requirements for Genetic Packages 

[0147] In order to obtain the display of a multitude of different though related potential binding domains, applicants 
generate a heterogeneous population of replicable genetic packages each of which comprises a hybrid gene including 
a first DNA sequence which encodes a potential binding dorrmin for the target of interest and a second DNA sequence 

23 whteh encodes a dteplay means, such as an outer surface protein native to the genetic package but not natively as- 
sociated witii the potential binding donnain (or the parental binding domain to which it is related) which causes the 
genetic package to display the con-esponding chimeric protein (or a processed fonm thereof) on its outer surface. 
[0148] The component of a population that exhibits the desired binding properties may l^e quite small, for example, 
one in 10^ or less. Once this component of the population Is separated from the non-binding components, it must t>e 

30 possible to annpiify it Cuituring viable cells is tiie most powerful amplification of genetk: material known and is preferred. 
Genetic messages can aiso be amplified in vitro , e.g. by PGR, but this is not the most preferred method, 
c;^ [01 49] Preferably, the GP can be: 1 ) genetically altered with reasonable f adiity to encode a potential binding domain, 
■ - 2)^maintalned and^ampiified^in culture, 3) manipulated to display the potential binding protein domain where It can 
interact with the target material during affinity separation, and 4) affinity separated while retaining the genetic informa- 

3s , ^t»n encoding the displayed binding domeiln In recoverable fonn. Preferably, the GP remains viable after affinity sepa- 
ration. Preferred GPs are vegetative bacterial celts, bacterial spores and, especially, bacterial DNA viruses. Eukaryotb 
rceiis^and-eukarybtic virusesrmay be used as genetic packages, but are not preferred. 

'-[0150]^ Wh6n the genetk: package is a bacterial cell, or a phage which is assembled peripiasmically, the display 
-imeans has two components. The first component Is a s^^retion signal which directs the Initial expression product to 

40 the inner membnane of the cell (a host cell when the package is a phage). This secretton signal Is cleaved off by a 
signal peptidase to yield a processed, mature, potential binding protein. The second component is an outer surface 
transport signal which directs the package to assemble the processed protein Into its outer surface. Preferably, this 
outer surface transport signal is derived from a surface protein native to the genetic package. 
[0151] For example, in a prefenred err^odlment, the hybrid gene comprises a DNA encoding a potential binding 

45 domain operably linked to a signal sequence (e.g. , the signal sequences of the bacterial phoA or bia genes or the 
signal sequence of IM1 3 phage genelli) and to Dl^ encoding a coat protein (e.g. , the M1 3 gene ill or gene VIII proteins) 
of a filamentous phage (e.g. , M13). The expression product transported to the inner membrane (lipid bliayer) of the 
host ceil, whereupon the signal peptide is cleaved off to leave a processed hybrid protein. The G^ermlnus of the coat 
proteln-like component of this hybrid protein is trapped In the lipid bliayer, so that the hybrid protein does not escape 

so Into the peripiasmlc space. (This (s typical of the wild-type coat protein.) As the singte^tranded DNA of the nascent 
phage partk:ie passes into the periplasmic space, it collects both wild-type coat protein and the hybrid protein from the 
lipid bliayer. The hybrid protein is thus packaged into the surface sheath of the filamentous phage, leaving the potential 
binding dontain exposed on its outer surface. (Thus, the filamentous phage, not the host bacterial ceil, is the "repilcabie 
genetic package" in this embodiment.) 

55 [0152] If a secretion signal is n^^essary for the display of the potential binding domain, in an especially preferred 
embodiment the bacterial cell in which the hybrid gene is expressed Is of a "secretion-permtssive' strain. 
[0153] When the genetic package is a bacterial spore, or a phage (such as OX174 or X,) whose coat is ass^bled 
Intraceltulariy, a secretion signal directing the expression prodixn to the Inner membrane of the host bacterial ceil is 
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unnecessary. In these cases, the display means b merely the oirter surface transport signal, typically a dertvatKre ot 
K^pSd'Sf^trsevc,.. GPS are gK,en in TableZ References to osedfibd fusions in this section should 
be taken to apply, mutatis mutandis, to oso-pbd and osp^bd fusions as well. 

IV.B. Phages for Use as GPs: 

[0155] Perlplasmically assembled phage are preferred when the IPBD is a '^'^f '^^"^j;;'^'^^^ 
PBDs may not fold witttln a cell (these proteins may fold after the phage Is reteased frorn the ''J™^^ ""^ 
^^Sphage are preferred when the IPBD needs large or Insoluble P-^thetic group, s^ as Fe^ clusters), 
since the IPBD may not fold If secreted because the prosthetic group is '^"'Jj" *»^P^^^^^ ^.e gene for one 
roissi When varieaatlon is Introduced, multiple infections could generate hybnd GPs that cany tne 9e"« 
?^butl^l:^j;Strcop,esofa«PBDon*^^ 

infecHna cells with phage under conditions resulting in a low multipifrof-mfection (MOi). „»cnnat«i 

pST' Sl^fagL are excellent candidates for GPs f-'^J^^rhS^;^^^^ 

Uh intact mature phage, and because the genes are Inactive outside a bacterial host, rendering the mature phage 

STFotS^KJophage.thepreferredOSPisusu^lyonethattepresemon^^ 

S TcX lS^!^^ OSP such as M13 gill protein (6 copies/phage) may be an excellent choice as 

°!:s''^r„s:ir.;e^id.t,.^ 

a's^^ndc^p^of the recipient o^gene oTmto a novel engineered ose gene. It is preierreamanneosoaiil^-- 

S5s;""^^^tirchr^.rnr4nd,date^^ 

LeUage are high, ordered^^^^^^ 

s?re°^.«Tnr?:t:.r™ 

fewer). Such a truncated glil protein would be expressed in parallel with the complete gill protein, as gill proiem b 

required for phage infecBvity. ^ particular Interest The 

roiBii The filamentous phage, whteh include M13. f1 , fo, ir , iKe. at, n i , aini r ». k ^^t,^ <.= a 7*1 

polypeptide to be inserted into the inner cell membrane. ,^ „nH to a lesser extent residue 22, 

miMi An E coll sional peptidase (SP-I) recognizes amino acids 18. 21 , and 23. and, to a lesser ^^^'IT^ • 
[01621 An t-TOii signal P"!™""^ ' „,o««t fki iMNBSa KUHNSSb, OLIVBT). After removal of the signal 

domain. 

[01641 We have constructed a tripartite gene comprising: 

1) Dm encoding a signal sequence directing secretion of parts (2) and (3) tt^rough the inner membrane. 

2) DMA encoding the mature BPTI sequence, and 

3) ONA encoding the mature M13 gVlll protein. 

-mis aene causes BPTI to appear In active fomi on the surface of M13 phage. 
pieS sequence of M13 pm-coat (SCHA78). called AA^seql . is 
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AA segl 

1 X 2 ||2 3 4 4 5 

5 0 5 0 \/5 0 5 0 5 0 
5 MKKSLVLKASVAVATLVPMLSPAA^^ 



5 6 6 7 7 

5 0 5 0 3 

10 MWVIVGATI6ZICLFKKF7SKAS 



The 6ite for Inserting a novel protein domain Into M13 CP Is after A23 because SP-I cleaves the precoat protein 
after A23, as Indicated by the arrow. Proteins that can be secreted will appear connected to mature M13 CP at Its 

IS amino tennlnus. Because the amino terminus of mature M13 CP is located on the outer surface of the virion, the 
Introduced domain will be displayed on the outside of the virion. The uncertainty of the mechanism by which M13CP 
appears in the lipid bllayer raises the possibility that direct insertion of bgtl Into gene VIH may not yield a functional 
fusion protein. It may be necessary to change the signal sequence of the fusion to, for example, the phoA signal 
sequence (MKQSTIALALLPLLFTPVTKA ) (MARK91). Marks etal. (IMARK86) showed that the phoA signal peptide 

20 could direct mature BPTI to the E. coH periplasm. 

[0166] Another vehicle for displaying the IPBD Is by expressing It as a domain of a chimeric gene containing part or 
all of gene ML This gene encodes one of the minor coat proteins of M13. Genes VI, VII, and tX also encode minor coat 
proteins. Each of these minor proteins is present in about 5 copies per virion and is related to morphc^enesis or 
Infection. In contrast, the major coat protein is present In more than 2500 copies per virion. The gene VI, VII, and IX 

25 proteins are present at the ends of the virion; these three proteins are not post-translatlonally processed (RASC86). 
[0167] The single-stranded dncular phage DNA assodates with about five copies of the gene ill protein and is then 
extruded through the patch of m^brane-assodated coat protein in such a way that the DNA is encased in a helical 
sheath of protein (WEBS78). The DMA does not base pair (that would Impose severe restrictions on the virus genome); 
rather the bases intercalate with each other Independent of sequence. 

30 [0168] Smith (SMITBS) and de la Cruz et ah (DELABB) have shown that Insertions into gene 111 cause novel protein 
domains to appear on the virion outer surface. The mlni-protein*s gene may be fused to gene HI at the site used by 
Smith and by de iaCruz et a|4 at a codon corresponding to another domain boundary or to a surface loop of the protein, 
or-to the amino termlnus^oftthe mature protein. 

[0169] All published works use a v^or containing a single modified gene IH of fd. Thus, ail five copies of gill are 
35 identically modified. Gene Ml Is quite targe (1272 b.p. or about 20% of the phage genome) and It Is uncertain whether 
a duplicate of the whole gene can be stably inserted Into the phage. Furthermore, alt five copies of gill protein are at 
f -bne^end of the virion. When bivalent target molecules (such as antibodies) bind a pentavalent phage, the resulting 
^complex niay be ln^versible. Inmersible binding of the GP to the target greatly Interferes with atTtnlty enrichment of 
r^^e^Ps that carry the genetic sequences encoding the novel polypeptide having the highest atfintty for the target 
40 [0170] To reduce the likelihood of fonnatlon of irreversible complexes, we may use a second, synthetic gene that 
encodes carboxy-terminal parts of 111; the cartraxy-terminal parts of the gene Hi protein cause It to assemble Into the 
phage. For example, the final 29 residues (starting with the arglnlne specified by codon 398) may be enough to cause 
a fusion protein to assemble into the phage. Altemativeiy, one might Indude the final globular domain of mature gill 
protein, viz. the final 150 to 160 amino adds of gene III (BASS90). We might, for example, engineers gene that consists 
45 of (from 5* to 3"): 

1) a promoter (preferably regulated), 

2) a ribosome-blnding site, 

3) an initiation codon, 

^ 4) a functional signal peptide directing secretion of parts (5) and (6) through the inner membrane, 

5) DNA encoding an IPBD, 

6) DNA encoding residues 275 through 424 of M13 gill protein, 

7) a translation stop codon, and 

B) (optionally) a transcription stop signal. 

55 

We leave the wild-type gene HI so that some unaltered gene III protein will be present. Altemativeiy, we may use gene 
Vlll protein as the OSP and regulate the osp::ipbd fusion so that only one or a few copies of the fusion protein appear 
on the phage. 
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[0171] M13 gene VI, VII, and IX proteins are not processed after translation. The route by which these proteins are 
assembled into the phage have no^ been reported. These proteins are necessary for normal morphogenesis and in- 
fectlvity of the phage. Whether the: .3 molecules (gene VI protein, gene VII protein, and gene IX protein) attach them- 
selves to the phage: a) from the cytoplasm, b) from the periplasm, or c) from within the lipid bitayer, is not known. One 
3 could use any of these proteins to Introduce an IPBO onto the phage surface by one of the constructions: 

1) ipbd: :pmcp , 

2) pmcp : :lpbd . 

3) signal :: lpbd :: pmcp , and 
^0 4) signal :: pmcp :: lpbd . 

where ipbd represents DI^IA coding on expression for the initial potential binding domain; pmcp represents DNA coding 
for one of the phage minor coat proteins. Vt, VII, and IX; signal represents a functional secretion signal peptide, such 
as the phoA signal (MKQSTIAl^LLPi-LFTPVTKA); and "::" represents In-frame genetic fusion. The indicated fusions 

IB are placed downstream of a Icnown promoter, preferably a regulated promoter such as lacUVS . tac, or trg. Fusions (1 ) 
and (2) are appropriate when the minor coat prot&\n attaches to the phage from the cytoplasm or by autonomous 
insertion into the l^id bilayer. Fusion (1) is appropriate if the amino terminus of the minor coat protein is free and (2) 
is appropriate if the carboxy tenninus is free. Fusions (3) and (4) are appropriate If the minor coat protein attaches to 
the phage from the periplasm or from within the lipid bilayer. Fusion (3) is appropriate if the amino temninus of the minor 

20 coat protein is free and (4) is appropriate if the carboxy terminus is free. 

[0172] Similar constructions could be made with other filamentous phage. Pf3 Is a well Icnown filamentous phage 
that infecis Pseuoomonas aemgenosa celts that hariaor an lncP-1 plasmid. The major coat protein of PF3 is unusual 
In having no signal peptide to direct Its secretion. The sequence has charged residues ASP7, ARG37, LYS40, and 
PHE44-COO which is consistent with the amino temiinus being exposed. Thus, to cause an IPBD to appear on the 

^ surface of Pf3, we construct a tripartite gene comprising: 

1) a signal &aquenc& known to cause secretion in P. aerugenosa (preferably known to cause secretion of IPBD) 
fused in-frame to, 

2) a gene fragment encoding the IPBD sequence, fused in-frame to, 
^0 3) DMA encoding the mature PfZ coat protein. 

Optionally, DNA encoding a flexible linker of one to 1 0 amino acids and/or amino acids fomfiing a recognition site for 
a specific protease (e.g., Factor Xa) is introduced between the ipbd gene fragment and the Pf3 coat-protein gene. This 
tripartite gene is introduced into PTS so that it does not Interfere with expression of any Pf3 genes. To reduce the 
^ possibility of genetic recombination, part (3) is designed to have numerous silent mutations relative to the wild-type 
gene. Once the signal sequence is cleaved off, the IPBD is In the periplasm and the mature coat protein acts as an 
anchor and phage-assembiy signal. It does not matter that this fusion protein comes to rest in the i^id bilayer by a 
route different from the route foibwed by the wild-type coat protein. 

[0173] As described in WO90/02809. other phage, such as bacteriophage a>X174, large DNA phage such as X or 
^ T4, and even RNA phage, may with suitable adaptattons and modlflcattons be used as GPs. 

IV.C. Bacterial Cells as Genetic Packages: 

[0174] One may choose any well-characterized bacterial strain which (1) may be grown in culture (2) may be engf- 

^ neered to display PBDs on its surface, and (3) is compatible with affinity selection. 

[0175] Among bacterial cells, the preferred genetic packages are Salmonella typhimurium . Bacillus subtilis, Pseu- 
domonas aeruginosa . Vibrio cholerae . Klobsleila pneumonia ; Neisseria gonorrhoeae ; Nelsserifl meningltidjS; Baoter* 
oides nodosus , Moraxeiia bovis, and especially Escherichia coll . The potential binding mini-protein may be expressed 
as an Insert in a chimeric bacterial outer surface protein (OSP). All bacteria exhibit proteins on their outer surfaces. E. 

so ooii Is the preferred bacterial GP and, for It, LamB Is a preferred OSP. 

[0176] While most bacterial proteins remain In the cytoplasm, others are transported to the peripiasmk: space (which 
lies between the plasma membrane and the cell wall of gram-negative bacteria), or are conveyed and anchored to the 
outer surface of the cell. Still others are exported (secreted) into the medium surrounding the cell. Those characteristics 
of a protein that are recognized by a cell and that cause it to be transported out of the cytoplasm and displayed on the 

BS cell surface will be temned "outer-suriace transport signals". 

[0177] Gram-negative bacteria have outer-membrane proteins (OMP), that fonn a subset of OSPs. Many OMPs 
span the nrtembrane one or more times. The signals that cause OMPs to localize in the outer membrane are encoded 
in the amino acid sequence of the mature protein. Outer membrane proteins of bacteria are inrtiaily expressed in a 
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precursor form Including a so-called signal peptide. The precurBor protein Is transported to the Inner nrtembrane, and 
the signal peptide moiety is extruded Into the perfplasmlc space. There. It Is cleaved off by a "signal peptidase", and 
the remaining "mature" protein can now enter the periplasm. Once there, other cellular mechanisms recognize struc- 
tures In the mature protein wh ich Indteate that Its proper place is on the outer membrane, and transport It to that*locatlon . 
s [0178] It Is well known that the DNA coding for the leader or signal peptide from one protein may be attached to the 
DMA sequence coding for another protein* protein X, to form a chimeric gene whose expression causes protein X to 
appear free In the periplasm. The use of exportiiemiissive bacterial strains (USS85, STAD8Q) increases the probaiDlltty 
that a slgnal-eequence-fusfon will direct the desired protein to the cell surface. 

[0179] OSP-iPBD fusion proteins need not flll a structural role In the outer membranes of Gram-negative bacteria 

10 because parts of the outer membranes are not highly ordered. For large OSPs there is likely to be one or more sites 
at whh:h osg can be truncated and fused to ipbd such that cells expressing the fusion will d^lay IPBDs on the cell 
surface. Fusions of fragments of omp genes with fragments of an x gene have led to X appearing on the outer membrane 
(CH AR88b,c. BENS64, CLEM81 }. When such fusions have been made, we can design an osp-tpbd gene by substituting 
ipbd for x in the DNA sequence. Othenvise, a successful OMP-IPBD fusion is preferably sought by fusing Itagments 

IS of the best omp to an \plbd , expressing the fused gene, and testing the resultant GPs for display-of-iPBD phenotype. 
We use the available data about the OMP to pick the point or points of fusion between omp and Ipbd to maximize the 
likelihood that IPBD will be displayed. (Spacer DNA encoding flexible linkers, made, e^, of GLY, SER, and ASN, may 
be placed between the osp- and Jgbd-derfved fragments to facilitate display.) Alternatively, we truncate osp at several 
sites or in a manner that produces osg fragments of variable length and fuse the osg fragments to Ipbd; cells expressing 

20 the fusion are screened or selected whk:h display IPBDs on the ceil surface. Freudl at |i (FREU89) have shovm that 
fragments of OSPs (such as OmpA) above a certain size are Incorporated Into the outer membrane. An additional 
attemative Is to include short segments of random DNA in the fusion of omp fragnents to igbd and then screen or 
select the resulting varl^ated population for members exhibiting the dIsplay-of-IPBD phenotype. 
[0180] In B coli, the LamB protein is a well understood OSP and can be used. The E roli LamB has been expressed 

^ In functional form In S. typhlmurium . cholerae. and K. pneumonia , so that one could display a populatton of PBDs 
in any of these species as a fusion to E. ^ LamB. IC pneumonia expresses a maltoporin similar to LamB (WEHM89) 
which could also be used. In P. aeruginosa , the D1 protein (a homologue of iBsnB) can be used CTR1A88). 
[0181] LamB is transported to the outer membrane if a functional N-tenntnal sequence is present; further, the first 
49 amino adds of the mature sequence are required for successful transport (BENS84). As with other OSPs, LamB 

30 of E. roll Is synthesized with a typteal signal-sequence whtoh is subsequently removed. Homology between parts of 
LamB protein and other outer membrane proteins OmpC, OmpF, and PhoE has been detected (NIKA84), including 
homology between LamB amino acids 39-49 and sequences of the other proteins. These subsequences may label the 
proteins for transport to^e outer membrane. 

[0182] The amino add sequence of LamB is known (CLEM81), and a model has been developed of how It anchors 

33 itself to the outer membrane (Reviewed.by, among others, BENZBSb). The lotion of its maltose and phage binding 
domains are also known (HEIN6B). Using this Information, one may kientify several strategies by which a PBD insert 
' 'ims^'be incorporated Into LannB to provlde^a^chlmeric^OSP which displays the PBD on the bacterial outer membrane. 
t[0183] When the PBDs^are to be displayed^by a^chlmeric transmembrane protein like LamB, the PBD could be 
^Inserted into a k>op normally found onthe surface of the cell (^ BECK83, MAN086). Alternatively, we may fuse a 5' 

40 segment of the osg gene to the ipbd gene fragment; \he point of fusion is pteked to correspond to a surface-exposed 
loop of the OSP and the cari>oxy terminal portions of the OSP are omitted. In LamB, it has been found that up to 60 
amino acids may be Inserted (CHAR88b.c) with display of the foreign epitope resulting; the structural features of OmpC. 
OmpA, OmpF, and PhoE are so sImOar that one expects similar behavk)r from these proteins. 
[01 84] It shou Id be noted that while LamB may be characterized as a binding protein, it Is used in the present Invention 

45 to provide an OSTS; its binding domains are not variegated. 

[0185] Ottier bacteria! outer surface proteins, such as OmpA, OmpC, OmpF, PhoE, and pilln, may be used in place 
of LamB and its homok>gues. OmpA is of particular interest because It Is very abundant and because homologues are 
known In a wide variety of gram-negative bacterial species. Baker etal. (BAKE87) review assembly of proteins into 
the outer menrtbrane of E coll and cite a topological model of OmpA (VOGE86) tiiat predicts that resklues 1 9-32, 62-73, 

50 105-118. and 147-158 are exposed on the cell surface. Insertion of a IgM encoding fragment at about codon 111 or 
at about codon 1 52 is likely to cause the IPBD to be displayed on the ceil surface. Concerning OmpA, see also M ACIBS 
and MAN088. Porin Protein F of Pseudomonas aeruginosa has been cloned and has sequence homology to OmpA 
of E coll (DUCH88). Although this homology Is not sufficient to allow prediction of surface-exposed residues on Portn 
Protein F, the methods used to detemriine the topologk»l model of Onr^A may be appPed to Porin Protein F. Works 

ss related to use of OmpA as an OSP include BECK80 and MACI88. 

[0186] MIsra and Benson (MISRSBa, MISRBSb) disclose a topok>gk»l model of E. ooli OmpC that predicts tiiat, 
among ethers, residues GLY1Q4 and LEU250 are exposed on tiie cell surface. Thus insertion of an igbd gene fragment 
at about codon 1 64 or at about codon 250 of the E. ^ ompC gene or at corresponding codons of the S. typhlmurium 
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oniC gene is IBce^ to causelPBD to appear on the ceM surface. The orsEC genes of 0 
SitJinSISeTiu^rf^^Ced^^^^^ 

thea- part of or^.ln one Of the indicated regions is likely to produceafum^on^^ 

Which leads rfi.lay of IPBD on the cell surface. In particular, ^-fon ^^ S 'toSm^ 

lead to a functional ompRiipbd gene. Concerning OmpF. see also REID88b. PAQE88. BEN5B8. iummb^. an 

piM^^Pilus proteins are of particular Interest because pUiated cells express rrmny copies of th^ ^^^^ 
hJTiL sZral^eoles (N aonorrt»oeae, P. aeruginosa . Moraxella bovte . Bacteroides nodosus, and E^coli) express 

that predicts that theproteinfomisafour4,elK bundle having stmcturalslnillanties to tobaccomo^^^ 
Semerythrin On this model, both the amino and carboxy tennini of the pmtem are exposed. The am^n°^J"« 
TrSlsS Blem^ (ELLEM) has levlewed piiins of Bacteroides nodosus and other spec.es and se otype dlffer- 
en^b'e ^SdlSL In the pliln protein £i^d^ vaH Jon 
amino-tem^mal portions of the pilin protein are highly conserved. J«""'"8S§t SL ^^^EI^S^^^^^ 
foot-and-mouth disease vims (residues 144-159) intothe a nodosi»type4fi^^^ 

to oMi^ccal Diim They found that expression of the 3" -terrrrfnal fusion in R aemginosa led to a iriab e strain tha^ 
lR^*SL£raJjLofthefu«oTprotein.Jenni^^ 

rr^^^eyinsertedaGLY-GLYIinkerbelween the last pilin residue and the fi^^ 
to^^X^le linker-. Thus a preferred place to attach an IPBD Is the cart,oxy ^^'^^^^.^f^'Jff 
4 bllndlc ccuy Biso be u^. although the particular internal fusions tested by Jennings et ah (JENN89) appeared 
to be lethal in P. aenicrlnoaa. Concerning pllln, see also MCKEB5 and 0RNDB5. temlnus 
miMl JuddTjUDD86:iBD85)haslnvestigatedProtelnlAofiLaon«^ 

Is e" osed; thus, one couW attach an IPBD at or near the amino temiinus of the mature P.IA as a means to d«play 

T^^r^^lS^^^^^^E Of E. con has been disclosed by van der ^-V.? SL;(VAN°«6)^^^ 
SldSs eight toof^ that a^e^osed; lnsertlS"n ^an IPBD into one of these loops b likely to lead to W of the 
preoicis Bigni wops " *o!!wi.« i«» !>oi 238 and 275 are prefened locations for Insertton of and IPBD. 
IPBD on the surface of the cell. Residues 158, 201. 238, ana is/a are pr« «„h PhuP rQUDMBS^ which 

rOISIl Other OSPs that coukl be used Include E. wH BtuB, FepA, FhuA, lutA, FecA. and FhuE (^UP^^BfJ^""^^^ 
ZrL^ior nutrients usually found In low abu^d^ce. The genes of all these proteins have been ««J'^n°^b"* 

t^oX^moSaitenotyet arable. GudmunsdotaretaL(^^ 

o?BEid^FeDA by shoving that certain residues of BtuBface the periplasm and by d^^^^ 

S^e a PiSa SSoTSrPldA fuston to be secreted Into the periplasm by additton «f «"^^F^PJ^«^f 
re^ceX.in8Sdltiontosimple binary fusion of anlE^ 

1>ss::lpbd::pldA 
2> ss:: pldA :: ipbd 

^ ♦K^ ^«Hr.i«em ft rtnoft not remember how it got there and the 

IV D Bacterial Spores as Genetic Packages; 

B. subtijis cote orcotD genes. 
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[0194] It generally preferable to use as the genetic package a cell, spore or virus for which an outer surface protein 
which can be engineered to display a IPBD has already been identified. However, as explained In WOgo/02809, the 
present Invention Is not limited to such genetic packages, as an outer surface transport signal nnay be generated by 
var1egatlon*and-setectk)n technk^ues. 

5 

V.E Qenetic Construction and Expression Considerations 

[0195] The (Qpbd-osp gene may be: a) completely synthetto, b) a composite of natural and synthetic DNA, or c) a 
composite of natural DNA fragments. The Important point Is that the gbd segment be easily variegated so as to encode 

10 a multitudinous and diverse family of PBDs as previously described. A synthetic ipbd segment Is preferred because ft 
allows greatest control over placement of restriction sites. Primers complementary to regions abutting the osp-ipbd 
gene on Its 3' flank and to parts of the osp-fpbd gene that are not to be varied are needed for sequencing. 
[0198] The sequences of regulatory parts of the gene are taken from the sequences of natural regulatory elements: 
a) promoters, b) Shlne-Dalgamo sequences, and c) transcriptional terminators. Regulatory elements could also be 

15 designed from knowledge of consensus sequences of natural regutatory regions. The sequences of these regulatory 
eiements are connected to the coding regk>ns; restriction sites are also inserted in or adjacent to the regulatory r^tons 
to allow convenient nrmnipuladon. 

[0197] The essential function of the affinity separation Is to separate GPs that bear PBDs (derived from IPBD) having 
high afftntty forthe targetfrom GPs bearing PBDs having low affinity forthe target. If the elution volume of a GP depends 
20 on the number of PBDs on the GP surface, then a GP bearing many PBDs with tow affinity, GP(PBDw), might coelute 
with a GP bearing fewer PBDs with high afTinity, gp(pbd.). Regulation of the osp-pbd gene preferably Is such that 
most packages display sufftelent PBO to effect a good separation according to affinity. Use of a regulatable promoter 
to control the level of expression of the osp-pbd allows fine adjustnrtent of the chromatographic behavior of the varie- 
gated population. 

25 pii98] Induction of synthesis of engineered genes in vegetative bacterial cells has been exercised through the use 
of regulated promoters such as lacW5 , trgP. or tec (f\^AN182). The factors that regulate the quantity of protein synthe- 
. sized are sufficiently well understood that a wide variety of heterologous proteins can now be produced in E. coil , B: 
subtllis and other host cells In at least moderate quantities (BETT88). Preferably, the promoter for the osplpbd gene 
is subject to regulation by a small chemical inducer. For example, the lac promoter and the hybrid tiplac (tac) promoter 
30 are regulatable with isopropyl thiogalactoside (IPTG). The promoter for the constructed gene need not come from a 
natural os£ gene; any regulatable bacterial promoter can be used. A non-leaky promoter is preferred. 
'.[GF199] The present Invention-is not limited to a single method of gene design. The osp-ipt>d gene need not be syn- 
.^theslzed In toto; parts of the gene^may beiobtalned from nature. One may use any genetic engineering method to 
produce the correct gene fusion , so long as one can easily and accurately direct mutations to specific sites in the gbd 
35 DfsiA subsequence. 

; [0200] The coding portions of genes to be synthesized are designed at the protein level and then encoded In DNA. 
/r'^Tha ambiguity In the genetic code is exploited.to allow optimal placement of restriction sites, to create various dlstri- 
Ivibutlons of amino acids at variegated codons, to minimize the potential for recombination, and to reduce use of codons 
. ptare^pooriy translated in the host cell. 

40 

V.F Structural Considerations 

[0201] The design of the amino-acid sequence for the Ipbd-osp gene to encode invohfes a number of structural 
considerations. The design is somewhat different for each type of GP. In bacteria, OSPs are not essential, so there is 
43 no requirement that the OSP domain of a fusion have any of its parental functtons beyond lodging In the outer mem- 
brane. 

[0202] It is desirable that the OSP not constrain the orientation of the PBD domain; this is not to be confused with 
tack of constraint within the PBD. Cwtria et al. (CWIR90), Scott and Smith (SC0T9C), and Devlin et ah (DEVL90), have 
taught that variable residues in phage-di^layed random peptides should be free of influence from the phage OSP. We 

^ teach that binding domains having a moderate to high degree of conformational constraint will exhibit higher specifbity 
and that higher affinity Is also possible. Thus, we prescribe picking codons for variegation that specify amino acids that 
will appear In a well^fined framework. The nature of the side groups Is varied through a very wide range due to the 
combinatorial replacement of multiple amino adds. The main chain confonnations of most PBDs of a given class is 
very similar. The movement of the PBD reledive to the OSP should not, however, be restricted. Thus it is often appro- 

S3 priate to Inctude a flexible linker between the PBD and the OSP. Such flexible llnkere can be taken from naturally 
occuning proteins known to have flexible regions. For example, the gill protein of Ml 3 contains glydne-rich regions 
thought to aDow the amino^rmfnal dormins a high degree of freedom. Such flexible linkers may also be designed. 
Segments of polypeptides that are rich in the amino acids GLY, ASN, SER, and ASP are Gkely to give rise to flexibility. 
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MuRipte glycines are partioularty preferred. »«o«P«uch as LamB OmpA. or M13 gill protein, 

[0203] WhenwechoosetoinsertthePBDIntoasurf^loopofanOSPsu^ 

Leireafewconslderetior,sthatdonotartoewhe^^^^ 
exertasome constraining Irifluenca on the PBD; the ends ofthereoa^^^^^ 

insert a highly varied DNA sequence into Jy any moans). 

^co^^rrsr^^^^^ 

part of a PBO in which the structural constrains are f "PPl'^^JJ' "Jf^*^^ .^^ ,„fluence the Immunological 

P2041 It is known that the amino adds '"^^^^t^J^^^^^ or similar OSPs will 

'properties of these epitopes (VAND90). We expect tha ^^Ds 'n^rte j '^^^^ of the PBD. it 

1 irrfluenced by the amino acids of the loop ""J^^^^"^'^^;^^^^ mZ may be taken 

may be necessary to add one or more linker am no e^d^ bej^een the OSP«w n sequences 

Impart the desired degree of flexibility between the and the PBD^ j^,^ 

A low resolutton 3D model suffices. ^^^.tominirfthP mature OSP are the best candidates 

[0207] in the absence of a 3D stn«tum. the amino and ^^^^^^ ^ZnZ IPTO and OSP domains 
iorlnirt.onofthelEWgene.AfuncttonalfusK,nm^req^^^ 

to avoid unwanted interactions between tt,e "*^"^?^2J?^e o^^^^^ and the jebd fragment if needed. 

srtrnr^rbruS^^^^^^^ 

SJLl a bounda-y When subcloning heterolo^^^ 

[0^1 TliecrlterlaforldentliyingOSPdomalnssuttabteforcausI^^^^^ 

KusedtoidemllyandlPBD.Whenldeml1^nganOSP^^^^^ 

Will not appear in the final binding molecule ^^^^^^^^STZX^^^PBD^) the initial genette 
round. The major design concerns are that: a) the O^P-^'^^ w^^rro^^^^^^ easily manipulated. There 
construction be reasonably ^nvenlent. ^^,jn„'l,^SS^ be^n reviewed by Janin 

rc^rrAS^:?r':^rr^^^^ 
:s^s^srara:r^Ho"u^)^^^^^^ 

ment . . >th»tih« PBD is dlsolayed. and b) that the chimeric protein 

[0212] In bacterial OSPs, tnemaiorconsiue.m.w.». , 

not be toxic. detemilne whether the amino or carboxy temiini of the OSP is 

, [0213] From topological models <^fOSPsw«m^^^ 

exposed. It 80, then these are excellent ^'^^'^^^^'.^^^^^.te oS^^^ of plasmtoTci^M81 . CHAR88a.b). Nu- 
[0214] -me jamB gene has been sequenced ^"^"^ ^^J" to study export of proteins in E, 

coll. From various studies. Charbit et aL (CHARB8a,b) ^^^P^; „ ^806; we adopt the numbering 

. ^. a) embedded in the membmne. b) facing t « Penp^m and J ^"^ ^^^^^^ theTuter surface are 

of this model for amino acids In the mature protein. According J tt>l8 m<Wei, se ^ 
defined, including: 1) residues BBthreugh 111^);««;"«^3'^™'^^^^ ^^"p Se^on' of DNA encoding 
[0215] Consider a mini-protein embedded in LamB. for examp 
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G^NXCXaXXXCX^oSGis between codons 153 and 154 of lamB is Hkely to lead to a wide variety of LamB derivatives 
being expressed on the surface of E ^ cells. G^, N2, &ti. and 3^2 supplied to allow the mini-protein sufficient 
orientational freedonn that is can Interact optimally with the target Using afTinity enrichment (Involving, for example. 
FACS vja a fluorescently iat)eied target, perhaps through several rounds of enrichment), we might obtain a strain 

5 (named, for example, BEST) that expresses a particular LamB derivative that shows high affinity forthe predetennined 
target An octapeptide having the sequence of the inserted residues 3 through 1 0 f mm BEST is likely to have an affinity 
and specificity similar to that observed In BEST b^use the octapeptide has an internal structure that keeps the amino 
acids In a confonrtation that Is quite simitar in the LamB derivative and In the Isolated mini-protein. 
-[0216] Fusing one or more new domains to a protein may malce the ability of the new protein to be exported from 

10 the cell different from the ability of the parental protein. The signal peptide of the wild-type coat protein may function 
for authentic polypeptide but be unable to direct export of a fusion. To utilize the Sec^ependent pathway, one may 
need a different signal peptide. Thus, to express and display a chimeric BPTI/M13 gene Vlli protein, we found It ne> 
essary to utilize a heterologous signal peptide (that of phoA) . 

[Q217] GPs that display peptides having high affinity for the target nnay be quite difficult to eiute from the target, 
IS particulariy a mutth^aient target (Bacteria that are bound very tightly can simply multiply in s^.) Fpr phage, one can 
introduce a cleavage site for a specific protease, such as blood-ctotting Factor Xa, into the fusion OSP protein so that 
the binding domain can be cleaved from the genetfc package. Such cleavage has the advantage that all resulting phage 
have identical OSPs and therefore are equally Infective, even If poiypeptlde-displaying phage can be eluted from the 
affinity matrix without cleavage. This step alknvs recovery of valuable genes which might otherwise be lost To our 
20 knowledge, no one has disclosed or suggested using a specific protease as a means to recover an informatlon^n- 
talning genetic package or of converting a population of phage that vary In Infectlvity into phage having Identical Infec- 
tivlty. 

IV.G. Synthesis of Gene Inserts 

25 

[0218] The present invention is not limited to any particular method or strategy of DNA synthesis or construction. 
Conventional DNA synthesizers may be used, with appropriate reagent modificattons for production of variegated DNA 
(similar to that now used for production of mixed probes). 

[0219] The osp-pbd genes may be created by inserting vgONA into an existing parental gene, such as the osp-ipbd 
30 shown to be displayabte by a suitably transf omned QP. The present invention not limited to any particular method of 
Introducing the vgDNA, e.g., cassette mutagenesis or singie^stFanded-oligonucieotide-directed mutagenesis 

-^- IViiH. Operative aonlrigAfector 

^ , [0220] The operative. cloning vector (OCV) Is a repllcable nuctetc acid used to imroduce the chimeric ipbd-osp or 
IH lpbd-osp gene Into the genetic package. When the genetic package is a virus, ft may serve as its own OCV. For cells 
:^ :^^arHi'spores, the OCV4may be a plasinld, a virus, aphagemid, or a chromosome. 

^v^ HVJijTansformationotceiis: 

40 

[0221] When the GP is a cell, the population of GPs created by transforming the ceDs with suitable OCVs. When 
the GP Is a phage, the phage are genetically engineered and then transfected into host celts suitable for amplification. 
When tiie GP Is a spore, cells capable of sporulatlon are transfonmed witi) the OCV while in a normal metabolk: state, 
and then sporulation is Induced so as to cause the OSP-PBDs to be displayed. The present Invention Is not limited to 

45 any one method of transforming cells with DfslA. 

[0222] The transformed celts are grown first under non-selective conditions that allow expression of plasmid genes 
and then seized to kill untransf ormed cells. Transfonmed cells are then induced to express the osp-pt)d gene at the 
appropriate level of induction. The GPs carrying the IPBO or PBDs are then harvested by metiiods appropriate to the 
GP at hand, generally, centrffugation to palletize GPs and resuspenslon of the pellets in sterile medium (cells) or buffer 

so (spores or phage). They are then ready for verification that the display strategy was successful (where the GPs all 
display a "test" iPBD) or for affinity selection (where the GPs display a variety of different PBDs). 

IV.J. Veriffcation of Display Strategy: 

55 [0223] The harvested packages are tested to detenmine whether the IPBD is present on the surface. In any tests of 
GPs for the presence of IPBD on the GP surface, any k>ns or cofactors known to be essential for the stability of IPBD 
or AfM(l PBD) are included at appropriate levels. The tests can be done, e.g., by a) by affinity labeling, b) enzymatically, 
c) spectrophotomstrically. d) by affinity separation, or e) by affinity prm:ipmior\. The AflVl(IPBD) in this step is one 
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picked to have strong affinity (preterab^. K. < 1 cm M) for the IPBO molecule and t^e or no affinity for the wtGP. 
V. AFFINITY SELECTION OF TARGET-BINDING MUTANTS 
V.A. Affinity Separation Tec hnoio^Yf Gonerally 

M . chime* P™»l" ''f'T,'''^Lf^,ZSSXn»3to^^^^ 

Which is nomially bound by BPTI. n^«*i,xrt thnt the comsDondlna binding domain 

are separated from those which do not varleoated population of genetic packages which 

washed away. 

V.B. Affinity Chromatogra phy. Generally 

p«28, Afnnfty coiumn chromatogmphy. batch eiution from an ^-^"^^^^S^^^^^!^''' 
Selutlonf,LaplHte are very similar and hereinafter will be treated under ainnrtychron^o^^ 

[0229] If affinity chromatography is to be used, then: 

1) the molecules of the target material must be of sufflcient size and chemlc^ reacUvity to be appOed to a solid 

protein binding. 

£02301 ft,fin.tychromatog,^hyisthepreferredseparationmeans,butFACS.ele<^phores^^ 

properties. 

V.C. Tar get Materials 

molecule that Is stable in aqueous solvent may be used as a target. 
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[0234] Serine proteases such as human neutrophil etastase (HNE) are an espedally interesting class of potential 
target materials. Serine proteases are ubiquitous in IMng organisms and play vital roles In processes such as: digestion, 
blood clotting, fibrinolysis, immune response, fertilization, and post-transletional processing of peptide homnones. Al- 
though the role ^ese enzymes play is vital, uncontrolled or Inappropriate proteolytic activity can be very damaging. 

5 

V.D. Immobilization or L^llng of Target Material 

[0235] For chromatography, FACS, or electrophoresis there may be a need to covalentfy link the target material to 
. B second chemical entity. For chromatography the second entity is a matrix, for FACS the second entity is a fluorescent 
10 dye, and for electrophoresis the second entity is a strongly charged molecule. In many cases, no coupling Is required 
because the target material already has the desired property of: a) Immobility, b) fluorescence, or c) charge. In other 
cases, chemical or physical coupling m required. 

[0236] It is not necessary that the actual target material be used In preparing the immobilized or labeled analogue 
that is to be used In affinity separation; rather, suitable reactive analogues of the target material may be more conven- 
15 lent. Target materials that do not have reactive functional groups may be Immobilized by first creating a reactive func- 
tional group through the use of some powerful reagent, such as a halogen. In some cases, the reactive gmups of the 
actual target material may occupy a part on the target molecule that is to be left undisturbed. In that case, additional 
functional groups may be introduced by synthetic chemistry. 

[0237] Two very general methods of irnmoblilzation are widely used. The first Is to blotinylate the compound of interest 
20 and then bind the biotinyiated derivative to immobilized ovidin. The second method Is to generate antibodies to the 
target material, fonmobitize the antifc>odies by any of nunrterous methods, and then bind the target material to the Im- 
mobilized antibodies. Use of anta3odles Is more appropriate for largertarget materials; small targets (those comprising, 
for example, ten or fewer non-hydrogen atoms) may be so completely engulfed by an antibody that very little of the 
target is exposed In the target-antibody complex. 
23 [0238] Non-covalent Immobilization of hydrophobic molecules without resort to antibodies may also be used. A conrv 
pound, such as 2,3,3-tr!m6thyldecane Is blended with a matrix precursor, such as sodium alginate, and the mixture is 
extruded into a hardening solution. The resulting beads will have 2,3.3-trimethyldecane dispersed throughout and 
exposed on the suri'ace. 

[(3239] Other bnmobilizatlon methods depend on the presence of particular chemical functionalities. A polypeptide 
30 will present -NH2 (N-temiinal; Lysines), -COOH (C4ermlnal; Aspartic Adds; Glutamic Acids), -OH (Serines; Threonines; 
Tyrosines),' and -SH (Cysteines). For the reactivity of amino acid side chains, see CRE184. A polysaccharide has free 
-OiH^groups; as doe«'DfylA, which hasa sugar badcbone. 

[0240] c Matrices suitable foruse as support materials include polystyrene, glass, agarose and other chromatographic 
supports, and may be fabricated Into beads, sheets, columns, welts, and other forms as desired. 
35 [0241] -Earty In the selection process, relatively high concentrations of target materials may be applied to the matrix 
to facilitate binding; target concentrations may subsequently be reduced to select for higher affinity SBDs. 

V.E^^Elution of Uower Affinrty PBD-Bearing Genetic Packages 

40 [0242] The population of GPs is applied to an affinity matrix under conditions compatible with the Intended use of 
the binding protein and the population Is fractionated by passage of a gradient of some solute over the column. The 
process enriches for PBIDs having affinity for the target and for which the affinity for the target is least affected by the 
eluants used. The enriched fractions are those containing viable GPs that elute from the column at greater concentration 
of the eluant 

45 [0243] The eluants preferably are capable of weolcenlng noncovalent Interactions between the displayed PBDs and 
the Immobilized target material. Preferably, the eluants do not kill the genetic package; the genetic message corre- 
sponding to successful minl-proteirts is most conveniently amplified by reproducing the genetic package rotherthan 
by ]n vitro procedures stxh as PCR. The list of potenti^ eluants includes satts (Including Na+, NH4-f . RtH-, SO4-, 
H2PO4-, dtrate, K+, Ii4-, Cs^, HSO4-, CO3-, Can-, Sr++, a-, PO4— , HCO3-. 1^^, Ba4+, Br-, HPO4- and acetate), 

50 add, heat, compounds known to bind the target, and soluble target material (or analogues thereof). 

[0244] The uneluted genetic packages contain DiSIA encoding binding domains which have a sufficientiy high affinity 
for the target material to resist the elution condtttons. The DMA encoding sudi successful binding domains nnay be 
recovered In a variety of ways. Preferably, the bound genetic packages are simply eluted by means of a change In the 
elution conditions. Attematively, one may cu^re the genetic package in stbi, or extract the target-containing matrix 

55 with phenol (or other suitable soh^ent) and amplify the ON A by PCR or by recontbinant DfslA techniques. Additionally, 
if a site for a specific protease has been engineered into the display vector, the specific protease Is used to cleave the 
binding domain from the GP. 

[0245] Nonspecific binding to the matrix, etc., rnay be identified or reduced by techniques well known in the affinity 
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separation art 

V.F. Recoverv of packages: 

[02461 Recoveryotpackagesthatdlsplayblndlngtoananinttycolumnrmybeachtev^^ 

1) coll6Ctfm«*on8elutedfromth9Columnwm,agradtent as described ^^^^ 
contain GPs more enriched for genes encoding PBDs with high affinity for the column, 

2) eluta the column with the target material In soluble form, 

3) flood the matrix with a nutritive medium and grow the desired packages in smi , 

4) remowe parts of the matrix and use them to inoculate growth medluni. -,„»rp=«tin hound to taraet 

5) SLcaBy or enzymaticallydegradethelinlcage holding the target to the matrnc so 

Tdi^e me paci«ges and recover DNA with phenol or other suitable soWent; the recovered DNA is used to 
transform cells that regenerate GPs. 

It IS posslbletoutllbecomblnationB of these methods, it shouldberememberBd*^^^^^^^ 
Snfty matrix IS not the GPS Eor^. but the infomiation in them. Rec»e^ 
recovery Of genetic materialteeiientlal. If cells, spores, or virion^^ 

can recover the information through In stai ceH division, gemiinadon. or infection respectively- Proteoiync oeg 
of the paclcages and recovery of IDNA is not profenred. 

V.G. Amplifying the Enriched Packages 

L oi^bd gene are recovered from the GP, and introduced into a new. viable host. 
V.H. Characterizing the Putath>e SBDs: 

binding may be made for each free SBD protein by any suitable method. 

mm If we find that the binding Is not yet sufficient, we decide which residues of the ^^^^I^^r^^^^ 
S^ext If fte binding is s^^^^ 

binding protein. 

V.I. Joint selections: 

[02501 Onemaymodifytheafnnitysepamtionofthemethoddescrfcedtoseiectam^ec^^ 
E n3ttor!!i^?l B. or*at binds to both Aand B. either alternatively or SBnultaneously 

V J En gineering of Antagonists 

1 and SBD-2 are binding domains that shw high afnntty for t^^^^^ 
of being an antagonist to the target enzynrw. 
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VI EXPLOfTATION OF SUCCESSFUL BINDING DOMAINS AND CORRESPONDING DNAS 

[0253] While the SBD may be produced reconr^lnant DMA techniques, an advantage Inhering from the use of a 
mtnl*pnoteln as an IPBO is that ft is likely that the derived SBD will also behave Uice a minH^rotein and will be obtainable 
s by means of chemical synthesis. CThe term "chemical synthesis", as used herein, includes the use of enzymatic agents 
in a ceiMree environment.) 

[0254] It is also to be understood that mini-proteins obtained by the method of the present invention may be taken 
as lead compounds for a series of homoiogues that contain non-naturally occurring amino acids and groups other than 
amino adds. For example, one could synthesize a series of homoiogues in which each member of the series has one 
10 amino add replaced by its D enantlomer. One could also make homoiogues containing constituents such as p alanine, 
aminobutyrb add, 3-hydroxyproline, 2-Aminoadiptc add, N-ethyiasperagine, norvaline, etc. ; these would be tested for 
binding and other properties of interest, such as stability and toxicity 

[0255] Peptides may be chemically synthesized either in solution or on supports. Various combinations of stepwise 
synthesis and fragment condensation may be employed. 
IS [0256] During synthesis, the amino add side chains are protected to prevent branching. Several dtfTerent protective 
groups are useful for the protection of the thiol groups of cysteines: 

1) 4-methoxybenzyl (MBzl; Mob)(NiSH82; ZAf^BB), removable with HF; 

2) acetamfdometh^ (Acm) (NISH82; NISHB6; BECKBQc), removable with iodine; nnercury Ions (e.g ., mercuric 
20 acetate); silver nitrate; and 

3) S-para-methoxybenzyl (HOUG84). 

[0257] Other thiol protective groups may be found in standard reference worics such as Greene, PROTECTIVE 

GROUPS IN ORGANIC SYNTHESIS (1981). 
23 [0258] Once the polypeptide chain has been synthesized, disulfide boncte must be fomrted. Possible oxidizing agents 

indude air (HOUG84; N1SH86), fenlcyanide (NtSH82; HOUG84), todine (N1SH82), and performic add (HOUG84). 

Temperature, pH, solvent, and chaotmplc chemicals may affect the course of the oxidation. 

[0259] A iai^e numt>er of micro-proteins with a plurality of disulfide bonds have been' chemically synthesized In 

blologtoalty active form: conotoxin G1 (13AA, 4 Cys)(NISH-82); heat-stable enterotoxln ST (18AA, 6 Cys) (HOUG84); 
30 analogues of ST (BHAT66); D-conotoxin GVIA (27AA, SCys) (N-ISHd6; RIVI87b) ; Q-conotoxIn MVIIA (27 AA, 6 Cys) 

(OLIV87b) ; a-conotoxin St (13 AA, 4 Cys) (ZAFA88); ^-conotoxin Ilia (22AA, 6 Cys) (BECK89c, CRUZ89, HATA90). 
• Sometimes, the potypeptide naturally folds so that the oonBct disulfide bonds arc fonned. Other times. It must be 
r^helped along by^use of a' dtfferentty^rem6Vabie,protective group for each pair of cysteines. 

[0260] The successful binding domains of the present invention may, alone or as part of a larger protein, be used 
33 for any purpose for which binding proteins are suited, Induding isolation or det^on of target nriaterials. In furtherance 

of this purpose, the novel binding proteins may be coupled directly or indirectly, covalently or noncovaientiy, to a label, 
4 -carrier or support 

^^^tp261] Wheri used as a pharmaceutteal, the novel binding proteins may be contained with suitable carriers or adju- 

>^^ants. 

40 

EXAMPLE! 

DESIGN AND IMOTAGENESIS OF A CLASS 1 MICRO-PROTQN 

45 [0262] To Obtain a library of binding domains that are confonrmtionally constrained by a single disulfide, we insert 
DNA coding for the following family of micro-proteins into the gene coding for a suitable OSP. 



Where i i indicates disulfide bonding. Disulfides nomrtaliy do not form between cysteines that are consecutive 

on the polypeptide chain. One or more of the residues indicated above as X„ will be varied extensively to obtain novel 
S5 binding. There may be one or more amino adds that precede or foRow Xg, however, the residues before X^ or after 
Xg wiQ not t}e dgntficantly constrained by the diagrammed d^lfide bridge, and it is less advantageous to vary these 
remote, unbric^d re^dues. The last X residue Is connected to the OSP of the genetk: package. 
[0263] X^ , X2, X3, X4, X5, and Xg can be varied independently; 1^ a different scheme of variegation could be used 
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which encodes the substituUon set [F, S. Y, C. L. P. H R. I. T. N. A. D.J'IQL 

[0266] The advantages of the NWT over the NNK c»don '^^^ NWT or^NNK 
Ldone incmasad. Tables 10 and 130 compajB libmrles ' "^^^ll . iV amino-acid 
codons-Nm-encodeslBd^e^^^^^^^ 

larger number of independent transfonnants. ^ l^^3 g m 

peptidase-f cteavra after Si8- We replace this segment with 

.._....^..__..«„u_c...«r,AHr««n<*lmDaretheDha5»forinfecthrityJtl8i«efultolnserta^^^^ 

cl^;g';;te (^iQRj^irbet^en the PBD and the mature III protein; this not only allows onemauon-. 

the PBD, but also allows cleavage of the PBD from the GP. i^n^ne F S Y C L P. H. R. V, T, N. V. A. 

[0268] AphagellbmorinwhtehX,.X,X,.and>^a^^^^^ 
D.&G) and in which X3 and are encoded byNNG(atow^^ 

This library displays about 8.55x IOB micro-proteins ^'^^'fJ^y^^J,;" J^^^^ 
th.rdandfourthvariablepositlons{thecentralpositlon8ofthedisulhde^jlosedloop)aiieasunp 

of cysteines at these positions. „„k „*whioh««iiddlsDlav one of 10^2 random pentadecapep- 

possible. The sequences of these isolates is given In Table 820. .Miar enough to Devlin's 
p«70] WerecinlzedthatourTNailbraryshouldlndudeapi^jm^ 

'E- and -F" peptides to have the potential of exhibiting «t~P^y'f -'''^^"S which has the 
tlL seq^nce common to all members Of theTNM 

potential for fomilng a disulfide bridge with a span ot four toHowed ^ = ^Jg^^^^ „ ^ streptavidin was 
ng of biotin).AstDck solution of 1 mg per ml .n P^S ^ntalnlng 0^«£We ^^^^ 

to each250^Lcapacitywellof immulon WP'«^:^^'7f^^^J?S^4^^^ 1 hour. Prior to use In 

with 250 nL of PBS containing BSA at a fJ^'^J^'?" ""^jj^ coin ng 0.1% Tween. 

a phage binding assay the wells are washed rapldhr 5 tim«^ 

[0272] TO each StrAv-coated wen is aoaeo i ovi |«. u. u...u„ j v — • - L__eraturB followed by removal 
UiyotphageCIOiipfu-softheTh^l^ra^^^^^^^ 
of the non4Dound phage and 1 0 rapid w^hes wrth PK OJ^^^^ 
aand6to«movenon-specfflcbindJg.Thebouj^^^^^^^ 

rnrp^ge-rreirfrar^^^^^ 
KTabrrs^^sCoSd^^^^^ 

Woi:i^eLf all Of the putauve micr^^^^^^ 
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variable residue before the first cysteine couid have contained any of {F,S.Y,C.L,P,H,R,i,T,N,VAD.G}; the residues 
selected were {Y,H,UD.N} while phage HPQ has R The variable residue after the second cysteine also couid have 
had {F,S,Y,C.URH,R,I J,N.V.A,D.G}; the residues selected were {P.S.G.R.V} while phage HPQ has Q. The relatively 
poor binding of phage HPQ could be due to P4 or to Q^s or both. 
5 p>275J In a control experiment, the TN2 iibraiy was screened in an identical manner to that shown above but with 
the target protein being the blocking agent BSA. Following three rounds of binding, elution, and amplification, sixteen 
random pliage plaques were picked and sequenced. Half of the clones demonstrated a lack of insert (8/1 6), the other 
half had the sequences shown In Table 839. There is no consensus for this collection. 

[0276] We have displayed a related micro-protein, HPQ6. on phage. It is identical to HPQ except forthe replacement 
10 of CHPQFC with CHPQFPRC (see Table 820) . When displayed. HPQ6 had a substantially stronger affinity for strepta- 
vldin than either HPQ or Devlin's "P Isolate. (Devlin's "E" Isolate was not studied.) Treatment with dlthiothreitoi (DTT) 
maricedly reduced the binding of HPQ8 phage (but not control phage) to streptavldin, suggesting that the presence of 
a disulfide bridge within the displayed peptide was required for good binding. In view of the results of the screening of 
the TN2 library, It Is likely that the binding of phage HPQ6 could be f urither Improved by changing P4 to one of {Y,H,L. 
15 D,M} and/or changing Q^3 to one of {P,S.G.R,V}. 

EXAMPLE II 

A CYS: :HEUX::TURN::STRAND::CYS UNIT 

20 

[0277] The parental Class 2 micro-protein may be a naturally-occurring Class 2 micro-protetn. It may also be a domain 
of a larger protein whose structure satisfies or may be modified so as to satisfy the criteria of a class 2 micro-protein. 
The modification may be a sinrtple one. such as the Introduction of a cysteine (or a pair of cysteines) Into the base of 
a hairpin structure so that the hairpin may be closed off with a disulfide bond, or a more elaborate one, so as the 
25 modification of intemnedlate residues so as to achieve the hairpin structure. The parental class 2 micro-protein may 
also be a composite of structures from two or more naturally-occurring proteins, e.g., an a helix of one protein and a 
P strand of a second protein. 

[0278] One mtero-protein motif of potential use comprises a disulfide loop enclosing a helix, a tum, and a rstum 
strand. Such a struc^re could be designed or It could be obtained from a protein of known 3D structure. Scorpk>n 

30 neurotoxin, variant 3, (AlJyAA83a, ALMAB3b) (hereafter ScorpTx) contains a structure diagrammed In Figure 1 that 
comprises a helix (residues through N33) , a tum (residues 33 through 35), and a retum strand (residues 36 through 
41).' ScorpTx contains disulfides.that join residues 1 2-65, 1 6-41 , 25-46, and 29^. CYS23 and CYS^^ are quite dose 
^ and could be Joined^y adisUlflde wtthout deranging the main chain. Rgure 1 shows CYS25 joined to CYS4^. in addition, 
CYS^ has been changed to GLN. It Is expected that a disulfide will fomri between 25 and 41 and that the heitx shown 

35 will fomn; we know that the amino-acid sequence shown Is highly conrtpatible with this structure. The presence of GLY3S, 
GLY3g, and GLY39 give the tum and extended strand suffk:lent flextolltty to accommodate any changes needed around 
'^f^e?i:S4i to form the disulfide. 

-^t0279] From examination of this structure (as found in entry 1SN3 of the Brookhaven Protein Data Bank), we see 
;x^,al the folk>wing sets of residues would be prefered for variegation: 

40 



45 



so 
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SET1 
[0280] 



SET ; 








1) T„ 


MUG 


L*R«MVSPt3^}KEH6 . 


13/15 


2) Bu 


VHG 


■ IHVPTA6KE 


9/9 


3) As, 


VB6 


XMVFTAGKB 


9/9 


4) Km 


VE6 


UfVPTAGKB 


9/9 


5) G« 


NN6 


L*R>HVSPTAQRSHG; 


13/15 


6) E„ 


VB6 


USVPTAGKB 


9/9 


7) Qm 


VAS 


HQNKBD 


6/6 



Note: Exponents on amino acids indicate xmiltipXicity of 
codons. 

[0281] Positions 27, 28» 31 , 32. 24, and 23 comprise one face of the helix. At each of these locations we have picked 
a variegating codon that a) includes the parental annino acid, b> includes a set of residues having a predominance of 
helix favoring residues, c) provides for a wide variety of amino adds, and d) leads to as even a distribution as possible. 
Position 34 Is part of a tum. The side group of residue 34 could interact with molecules that contact the side groups of 
resideus 27, 28, 31 , 32, 24, and 23. Thus we allow variegation here and provide amino acids that are compatible with 
turns. The variegation shown leads to 6.65-1 amino acid sequences encoded by 8.85*10^ DNA sequences. 

SET 2 

[0282] 







lowed amlnta aeida 




1) r>M 


VHS 


L* IMV> P 'T'A* HQNKDE 


13/18 


2) T„ 


NNG 


L'R'MVSPTAQKSWG. 


13/15 


3) K30 


VBG 


KSQPIAIMV 


9/9 


4) A,, 


VHS 


KBQPTAIMV 


9/9 


S) Kj» 


VHG 


IMVPTAGKE 


9/9 


6) 


RRT 


SND6 


4/4 


7) Ym 


NHT 


YSFHPLNTIDAV 


9/9 



[0283] Positions 26, 27, 30, 31 , and 32 are variegated so a to enhance helix-favoring amino adds in the population. 
Residues 37 and 38 are in the return strand so that we pick different variegation codons. This variegation allows 
4.43-1 0^ amino-acid sequences and 7.08*1 0^ DNA sequences. Thus a library that embodies this scheme can be sam- 
pled very efficiently. 
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EXAMPLE III 

DESIGN AND MUTAGENESIS OF CLASS 3 MiCRaPROTEIN 

3 Two Disulfide Bond Pflrental Micro-Proteins 

[0284] Mtcro-protelns wrth two disulfide bonds may be modelled after the a-conotoxins, e.g. , Gl, GIA, Gil, Ml, and 
Si. Tliese have the following conserved structure: 

10 

(1-2. AAS) -C-C- {3 AAs) -C-(5 AAs) -C- (0-5 AAs) 
1 I 



IS 

[0285] Hashimoto etaL(HASI-i85) reported synthesis of twenty-four analogues of aconotoxInsGi, Gil, and Ml. Using 
the numbering scheme for Gl (CYS at positions 2, 3. 7. and 13), Hashimoto et ai. reported alterations at 4, 8. 10, and 
12 that allows the proteins to be toxic. Almquist et aL (ALMQB9) synthesized [des-GLU^] a Conotoxin Gl and twenty 

20 analogues. They found that substituting GLY for PROg gave rise to two isomers, perhaps related to different disulfide 
bonding. They found a number of substitutions at residues 8 thmugh 11 that allowed the protein to be toxic. Zafaralla 
et aL (ZAFAB8) found that substituting PRO at position 9 gives an active protein. Each of the groups cited used oniy 
In vivo toxicity as an assay for the activity. From such studies, one can infer that an a^e protein has the parental 3D 
struc^re, but one can not infer that an inactive protein lacks the parental 3D structure. 

25 [02^ Pardi et ah (PARD89) determined the 3D structure of a Conotoxin Gl obtained from venom by NM R. KobayashI 
et aL (K0BAB9) have reported a 3D structure of synthetic a Conotoxin Gl from NMR data which agrees with that of 
PARD89. We refer to Figure 5 of Pardi etaL. 

[0287] Residue GLU^ is known to accomodate GLU, ARG, and ILE in known analogues or homotogues. A preferred 
variegation codon Is NNG that allows the set of amino adds [L2R2MVSFnA-QKEWG<stop>]. From Rgure 5 of Pardi 
30 1^ aL we see that the side group of GLU^ project into the same region as the strand comprising residues 9 through 
^ 12?Residues 2 and 3 are cysteines and are not^to be varied. The side group of residue 4 points away from residues 
^'^9 through 12; thus we'defer varying this residue until a later round. PRO5 may be needed to cause the conBCt disulfides 
N^'to fbnrii'Wheh Gt^Y-^was'Substituted-here^t^ into two fonms, neither of which Is toxic. It is allowed to vary 

PROg, but notperfened in the first round. 
35 [0288] No substitutions at ALAg have been reported. A preferred variegation codon is RMG which gives rise to ALA, 
TMR, LYS, and GLU (smati hydrophobic, small hydrophilic, positive, and negative). CYS7 Is not varied. We prefer to 
^1 ieave^jGLYa^as i&Wa^ Al-Ag is toxk:. Honrwiogous proteins having various amino 

adds at:posftion^^re'toxic;rthus, we use an^NNT variegatton codon which aik)ws FS^CLPHRITNVADG. We use 
NNT^at positions<:1 Or tli-andcia as welh At position 14, following the fourth CYS. we allow ALA, THR, LYS, or GLU 
40 (via an RMG codon). This variegation allows 1 .053-1 0^ anlno-add sequences, encoded by 1 .68*1 0^ DN A sequences. 
Libraries having 2.0-1 0^, 3.0-1 0^, and 6.0-1 0^ independent transfomnants will, respectively, display w70%, «83%, and 
<s95% of the allowed sequences. Other variegations are also appropriate. Concerning a conotoxins, see, inter alia , 
ALJMQ89, CRUZ85. GRAY83, GRAY84. and PARD89. 

[0289] The parental micro-protein may indeed be one of the proteins designated °Hybrid-r and ''Hybrid-i 1" by Pease 
^ etai. (PEAS90); cf^ Figure 4 of PEAS90. One prefen^ed set of residues to vary for either protein consists of: 



so 
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Parental 


Variegated 
Codon 


Allowed 
Amino acids 


AA seqs/ 
DNA seoa 


AS 


RVT 


ADGTNS 


6/6 


P6 


VTTT 


FBVLIV 


6/6 


E7 


RRS 


EDNXSRG* 


7/8 


T8 


VHG 


TPALMVQKB 


9/9 


A9 


VBG 


ATPIMVQKB 


9/9 


AlO 


JtHG 


AEKT 


4/4 


K12 


VB5 


KQSTPAXMV 


9/9 


Q16 


NN6 


L>R>S.HP(9nXVAE6 


13/15 



This provides 9.55*108 amino-acid sequences encoded by I^S-IO? DNA sequences. A library comprising 5.0*1 0^ 
transformants allows expression of 98.2% of all possible sequences. At each position, the parental amino acid is al- 
lowed. 

20 [0290] At position 5 we provide amino acids that are compatible with a turn. At position 6 we allow ILE and VAL 
because they have branched p carbons and make the chain ridged. At position 7 we allow ASP, ASN, and SER that 
nften Aonenr At the flm|no termini of hettcea. At positions B and 9 we allow several helix-favoring amino actds (ALA, 
LEU, MET, GLN, QLU, and LYS) that have differing charges and hydrophobidties because these are part of the helix 
proper. Position 1 0 Is furttier around the edge of the helix, so we allow a smaller set (ALA, THR, LYS, and GLU). This 

25 seft not only includes 3 helix-favoring amino acids plus THR that is weil tolerated but also allows positive, negative, 
and neutral hydrophiiic. The side groups of 12 and 16 project into the same region as the residues already recited. At 
these positions we allow a wide variety of amino adds with a bias toward helix-favoring amino acids. 
[0291] The parental micro-protein may instead be a polypeptide composed of residues 9-24 and 31 -40 of aprotinin 
and possessing two disulfides (Cys9-Cys^ and Cys14-Cys38) Such a polypeptide would have the same disulfide 

30 bond topology as a-conotoxin. and its two bridges would have spans of 12 and 17, respectively. 

[0292] Residues 23, 24 and 31 are variegated to encode the amino add residue set [G.S,R.D,N.H,P,T.A] so that a 
sequence that favors a turn of the necessary geometry fs found. We use trypsin or anhydrotrypsin as the affinity moiuc- 
uie to enrich for GPs that display a micro-protein that folds into a stable structure similar to BPTl In the PI region. 

35 Three Disulfide Bond Parental Micro*Proteins 

[0293] The cone snails (Conus) produce venoms (conotoxins) which are 1 0-30 amino acids in length and exception- 
ally rich in disulfide bonds. They are therefore archetypal micro-proteins. Novel micro-proteins with three disulfide 
bonds may be modelled after the \L'{G\\\K GlilB, GIIIC) or Q-(GVIA, GVIB, GVIC. GVIIA, GVIIB, MVIIA, IVIVIIB. eta) 
40 conotoxins. The ji-conotoxins have the following conserved structure: 



12 3 !• 2"3» 

(2 Afta)-C'C-(5 AAb)-C-(4 AAs>-C-(4 AAs) -C-C-AA 



so [0294] No 3D structure of a ji-conotoxln has been published. Hidaka et al. (HIDA90) have established the connectivity 
of the disulfides. The following diagram depicts geographutoxin I (also Icnown as ^t-conotoxln GlilA). 
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Tho connection from R1 9 to C20 could go over or under the strand from Q14toC15.0ne preferred form of variegation 
Is to vary the residues In one loop. Because the longest loop contains only five amino acids, it is appropriate to also 
vary the residues connected to the cysteines that fonn the loop. For example, we might vary residues 5 through 9 plus 
2, 11, 19. and 22. Another useful variegation would be to vary residues 11-14 and 16-19, each through eight amino 
acids. Concerning \l conotoxins, see BECK89b, BECKB9c, CRUZ89, and MIDA90. 
[0295] The Q-conotoxins may be r^resented as follows: 

1 2 3 !• 2« 3' 

C-(6 AAa)-C-(e AA9)*C-C-(2-3 AAs)-C-(4-6 AAs) -C 

' — . 1 I ' J 



The King Kong peptide has the same disulfide arrangement as the a<:onotoxins but a different biological activity. 
Woodward et al. (WOOD90) report the sequences of three homologuous proteins from C. textile. Within the mature 
toxin domain^ only the c^teines are conserved. The spacing of the cysteines Is exactly conserved, but no other position 
hastthe same^amino'acld in ail three sequences and only a few positions show even palr-wtse matches. Thus we 
> conclude that all positions (except the cysteines) may be substituted freely with a high probability that a stable disulfide 
structure will form. Concerning a conotoxins, see HlLL59:and SUNX87. 

[0296] Another miwo-protein which may be used as^a parental binding domain is the Cucurblta maxima trypsin 
inhibitor I (CMTI-1); CMTMII is also appropriate. They are members of the squash family of serine pmtease Inhibitors, 
which also Includes Inhibitors from summer squash, zucchini, and cucumbers (WIECB5). McWherteret aL (MCWH89) 
describe synthetic sequenc&variants of the squash-seed protease Inhibitors that have affinity for human leulcocyte 
elastase and cathepsin G. Of course, any member of this family might be used. 

[0297] CMTI-I Is one of the smallest proteins known, comprising only 29 amino adds held in a fixed comformation 
by three disulfide bonds. The structure has been studied by Bode and colleagues using both X-ray diffraction (BODE89) 
and NMR (H0LAB9a,b). CMTI-I Is of ellipsoidal shape; ft lacks heilces or ^sheets, but consists of turns and connecting 
short polypeptide stretches. The disulfide pairing Is CyB3-CyB20. Cysl 0-Cys22 and Cysl 6-Cy828. In the CMTI-Ilrypsln 
complex studied by Bode et aU 13 of the 29 Inhibitor residues are In direct contact with trypsin; most of them are In 
the primary binding segment Val2{P4)-Glu9 (P4*) which contains the reactive site bond Arg5(P1)-lle8 and Is In a con- 
formation observed also for other serine proteinase inhibitors. 

[0298] CMTI-I has a Kt for trypsin of «1 .5.10''»2 m. McWherter ^ al. suggested substitution of "moderately bulky 
hydrophobic groups" at PI to confer HLE spedfidty. They found that a wider set of residues (VAL, ILE, LEU. AUV, 
PHE, MET. and GLY) gave detectable binding to HLE. For cathepsin G. they expected bulky (especially aromatic) side 
groups to be strongly prefenred. They found that PHE, LEU, MET, and ALA were functional by their criteria; they did 
not test TRP, TYR. or HIS. (Note that ALA has the s^nd smallest side group available.) 

A preferred initial variegation strategy would be to vary some or all of the residues ARG^ , VALg, PRO4, ARG5, 
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ILEg. LEUy. METb. GLU9. LYS11. HISa. GLYag. TYRz^. and GLYa- If the target were HNE. for example, one could 
synthesize DNA embodying the following possibilities: 

Allowed . . segs/ 



gareatftJ. — 


vg 

CodQii_ 




VNT 


VM12 


NHT 


PRO4 


VYT 


AR63 


VNT 




NMK 


XIEU7 


VHG 


TYR„ 


HAS 



RSLPHrnJV»DG 12/12 

VHiPTtBND ®/8 
PLTIAV 

•BShBBrrsrrxxi ' 12/12 

all 20 20/31 

LQMKVB 6/6 



7/8 



This allows about 5 81-108 aminoHWid sequences encoded by about 1 .03-107 dna sequences. A iibraiy comprlsit^ 

wouidV ^ of the possible sequences. Other variegation schemes could 

also be used. 

[0300] Other inhibitors of this family Include: 

Trypsin Inhibitor I from Cltrullus vulgaris (OTLE87), 
Trypsin inhibitor II f rom BiYonia dioica (OTLE87), 
Trypsin inhibitor I from Cucurbita maxima (in OTLE87), 
trypsin inhibitor III from Cucurbita maxima (In OTLE87), 
trypsin Inhibitor IV from Cucurbita maxima (in OTLE87), 
trypsin Inhibitor II from Cucurbita pepo (In OTLE87), 
tiypsin Inhibitor Hi from Cucurbita gego (in OTLE87). 
trypsin inhibitor lib from Cucumis sativus (in OTLEB7), 
frn^ipQin inhihitnr IVfrom Cucumis satlvus (In OTLE87), 
trypsin inhibitor 11 from Ecballium etaterium (FAVEB9). and 
Inhibitor CM-1 from Momordica repens (in OTLE87). 

103011 Another microi^rotein that may be used as an initial potential binding domain is the heat-stable enterotoxins 
S fro^ ^m^^^^ p LL Citrobacterfreundil. and other bacteHa (GUAR89) T^ese micro-protel^ 

Am ^wnto^e s^^^^ f rom E coTfiii^are extremely stable. Works related to synthesis, cloning, expression and 
Ze^ S^es^^^^^^ SEKI85!lHiM87. TAKA86, TAKE90. THOM85a.b YOSHEJ. DA1X90. 

SJK ^Rmr. GUZM89. GUZM90. H0UGB4. KUB089. iCUPE90. 0KAMB7. OKAMSB. and OKAi^90. 

EXAMPLE IV 

AMlNl.PRaTEINHAVINGACROSS.UNKCONSrSTINQOFCU(II),ONECYST^^ 
METHIONINE. 

.„.w M»ft.ARN-QLY-iUICT-Xaa^Xaa.Xaa-Xaa-Xaa-Xaa-HIS-ASN-GLY-CYS and 

tures as shown in the diagram: 
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Zaa7— Xaa8 Xaa7 ^Xaa8 

/ \ / \ 

Xaafi Zaa9 Xaa6 Xaa9 

I i 11 

Xaa5 ZaalO XaaS zaalO 

\ / \ / 

■UET4 RISll HBT4 RISll 

/\/\ /\/\ 

/ \ / \ / \ / \ 

6LY3 Cu ASN12 6LY3 Cu ASM12 

I / \ I I / \ I . 

ASN2~BIS1 arS14-GLY13 ASN2-CYS1 BIS14-GLY13 

11 . I J 

NH,. COO NE, OX) 



Other arrangemsnlB of HIS, MET, HIS, and CYS along the chain are also likely to form similar structures. The amino 
acids ASN-GLY at positions 2 and 3 and at positions 12 and 13 give the amino acids that carry the metal-binding 
llgands enough flexibility for them to come together and bind the metal. Other connecting sequences may be used, e. 

GLY^N, SER-GLY, GLY-PRO, GLY-PRO-GLY, or PRO-GLY-ASN could be used. It is also possible to vary one or 
more residues in the loops that Join the first and second or the third and fourth metal-binding residues. For example, 




I \ / 

PR03 Cu 

\ / \ 

GLY2-fiISl CYSX5-CLT14 

I I 
NHj COO 



i8 likely to form the diagrammed structure for a wide variety of amino acids at Xaa4. It is expected that the side groups 
of Xaa4 and XaaS will be dose together and on the surface of the mini-protein. 

[0303] The variable amino acids are held so that they have limited flexibility. This cross-linkage has some differences 
from the disulfide linkage. The separation between 0^4 and C^^^ Is greater than the separation of the C^fi of a cystine. 
In addition, the interaction of reskiues 1 through 4 and 11 through 1 4 with the metal Ion are expected to limit the motton 
of reskiues 5 through 1 0 more than a disulfide between rsldues 4 and 11 . A single disulfide bond exerts strong distarK:e 
constrains on the a carbons of the joined reskiues, but very little direc^onal constrairrt on, for example, the vector from 
N to C In the main-chain. 

[0304] For the desired sequence, the side groups of residues 5 through 10 can form specific interactions with the 
target Other numbers of variable amino acids, for example, 4, 5, 7, or 3, are appropriate. Larger spans may be used 
when the enck)sed sequence contains segments having a high potential to form a helices or other s^ondary structure 
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that limte the conformational freedom of the polypeptide main chain. Whereas a mini-protein having four CYSs could 
form three distinct pairings, amlnl-protein having two HISs, one MET, and one CYS canfomn only two distinct complexes 
with Cu. These two structures are related by min^or symmetry through the Cu. Because the two HISs are distinguishable, 
the structures are different. 

5 [0305] When such metal-containing mini-proteins are displayed on filamentous phage, the cells that produce the 
phage can be grown in the presence of the appropriate metal ion. or the phage can be exposed to the metal only after 
they are separated from the celts. 

EXAMPLE V 

10 

A MINhPROTEIN HAVING A CROSS-LINK CONSISTINQ OF ZN(II) AND FOUR CYSTEINES 

[0306] A cross link similar to the one shown In Example XV is exemplified by the Zinc-finger proteins (GIBSSB. 
GAUS87, PARR88, FRAN87, CHOW87. HARD90>. One family of Zinc-fingers has two CYS and two HIS residues in 

IS conserved positions that bind Zn^ (PARR88. FRAN87, CHOW87. EVAN88, BERGB8. CHAV88). Gibson et al, 
(GIBS88) review a number of sequences thought to form zinc-fingers and propose a three-dimensional model for these 
compounds. Most of these sequences have two CYS and two HIS residues In conserved positions, but some have 
three CYS and one HIS residue. Gauss el aL (GAUS87) also report a zinc-finger protein having three CYS and one 
HIS residues that bind zinc. Hard et aL (HARD90> report the 3D structure of a protein that comprises two zinc-fingers, 

20 each of which has four CYS residues. All of these zinc-binding proteins are stable In the reducing intracellular envi- 
ronment. 

[0307] One preferred example of a CYS::zinc cross linked mini-protein comprises resWues 440 to 461 of the se- 
quence shown in Figure 1 of HARD90. The resiudes 444 through 456 may be variegated. One such variegation Is as 
follows: 



Parental 


Allowed 


#AA/#DNA 


SER444 


SER.ALA 


2/2 


ASP445 


ASP. ASN. GLU, LYS 


4/4 


GLU446 


GLU, LYS, GLN 


3/3 


ALA447 


ALA, THR, GLY, SER 


4/4 


SER448 


SER,ALA 


2/2 


GLY449 


GLY, SER, ASN, ASP 


4/4 


CYS450 


CYS, PHE.ARG, LEU 


4/4 


HIS461 


HIS, GLN. ASN. LYS, ASP. GLU 


6/6 


TYR452 


TYR. PHE. HIS, LEU 


4/4 


GLY453 


GLY. SER. ASN, ASP 


4/4 


VAL454 


VAL. ALA. ASP, GLY. SER. ASN. THR. ILE 


8/8 


LEU455 


LEU. HIS. ASP, VAL 


4/4 


THR456 


THR, ILE, ASN. SER 


4/4 



This leads to 3.77-1 0^ DN A sequences that encode the same number of amlno-acW sequences. A library having 1 .0-1 0^ 
independent transformants will display 93% of the allowed sequences; 2.0-1 0^ independent transf ormanta will display 
45 99.5% of allowed sequences. 



Table 2: 



Preferred Outer-Surface Proteins 


Genetic Package 




Preferred Outer-Surface Protein Reason for preference 


M13 


coat protein 


a) exposed amino terminus, (gpVlll)b) predictable post-translational 
processing. 

c) numerous copies In virion. 

d) fusion data available 




gplll 


a) fusion data available. 

b) amino terminus exposed. 
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Table 2: (continued) 





Preferred Outer-Surface Proteins 




Genetic Package 




Prefen^ed Outer-Surface Protein Reason for preference 


5 






c) working example available. 




PhlX174 


G protein 


a) known to be on virion exterior, 

b) small enough that the G-ipbd gene replace H gene. 


10 


E. coli 


LamB 


a) fusion data avail£dt)le, 

b) non-essentkil. 


IS 
SO 




OmpC 


a) topological model 

b) non-essential; abundant OmpAa) topoiogk:al model 

b) non-essential; abundant 

c) homologues in other genera 
umph 

a) topological model 

b) non-essential; abundant PhoEa) topobgica) model 

b) non-essential; abundant 

c) inducible 


25 


B. subtills 


Cote 


a) no post-translational spores processing, 

b) distinctive sdequence that causes protein to tocalize in spore coat, 

c) non-essential. 






CotD 


Same as for Goto. 



30 



33 



40 



45 



SO 
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T^le 10: AbiindanceB obtained 
from various vgCodons 



A, Optimized fxS Codon, Restrained by IDl + lBl - IKl- 





T 


c 


A 






1 


.26 


.la 


.26 


.30 


f 


2 


.22 


.16 


.40 


.22 


X 


3 


.5 


.0 


.0 


.5 


S 



Amino 



Amino 



Abundance 



A 
D 
F 
H 
K 
M 
P 
R 
T 
SL 
S£SEL 



4.B0% 
6.00% 
2.86% 
3.60% 
5.20% 
2.66% 
2.88% 

w • 

4.16% 

?..B6% Ifaa 



5.20% 



IR] = .12 







. C 


2.86% 


B 


6.00% 


6 


6.60% 


I 


2.86% 


L 


6.82% 


N 


5.20% 


Q 


3.60% 


s 


7.02% mfaa 


V 


6.60% 


Y 


5.20% 



[Dl + [El [Kl + 
ratio - Abun{W)/Ab\m(S) = 0.4074 



i Ji/stial' ^^^^ P^-^V/a? 

2.454 .4074 .9480 

6.025 .1660 -8987 

14.788 .0676 .8520 

36.298 .0275 .8077 

5 89.095 .0112 .7657 

6 218.7 4.57.10;» .7258 

7 536.8 1.86-10-' .6881 
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Table 10: Abimdances obtained 
from various vgCodon 
(continued) 



B. Unrestrained, optimized 





T 


C 


A 


. ,- 6 


1 


.27 


.19 


.27 


.27 


2 


.21 


.15 


.43 


.21 


3 


.5 


.0 


.0 


.5 



Amino 




Amino 




acid 


Abundance 




Abundance 


A 


4.05% 


C 


2.84% 


D 


5.81% 


E 


5.81% 


P 


2.84% 


6 


5.67% 


H 


4.08% 


I 


2.84% 


K 


5.81% 


L 


6.83% 


M 


2.84% 


N 


5.81% 


P 


2.85% 


Q 


4.08% 


R 


6.83% 


S 


?,89% pf^^ 


T 


4.05% 


V 


5.67% 






Y 


5.81% . 




5.81% 






IDI + 


[El » 0.1162 (Kl 


+ CRl 


B 0.1264 


ratio 


» Abun(H)/Abun(8) 


- 0.41176 



1 

1 2.4286 

2 5.8981 

3 14.3241 

4 34.7875 

5 84.4849 

6 205.180 

7 498.3 



.41176 

.16955 

.06981 

.02675 

.011836 

^004874 

2.007-10-» 



BtOD-free 
.9419 
.8872 
.8356 
.7871 
.74135 
.69828 
.6577 
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Table 10: Abundances obtained 
from various vgOodon 
(continued) 

C. Optimized NNT 

T C h g 

1 .2071 .2929. .2071 .2929 

2 .2929 .2071 .2929 .2071 

3 1. .0 .0 . .0 . 



Amino 
-S£isL 



A 
D 
F 
B 
K 
M 

ri 

«- 

R 

1. 



6.06% 
8.58% 
6.06% 
6.58% 

none 
none 

6.06% 

4.29% Ifaa 



none 



■ 

a 
1 


(1/ratioI 


2.0 


2 


4.0 


3 


8.0 


4 


16.0 


5 


32.0 


6 


64.0 


7 


128.0 



Amino 



4.29% Ifaa 



B 


none 


6 


6.06% 


I 


6.06% 


I. 


8.58% 


N 


6.06% 


n 

w 






8.58% mfaa 


V 


8.58% 


Y 


6.06% 



fratio>i 
.5 
.25 
.125 
.0625 
.03125 
.015625 
.0078125 



BtoD-free 

1. 

1. 

1. 
• 1. 

1. 

1. 

1. 
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Table 10: Abundances €)btalned 
from various vgCodon 
(continued) 



Optimized MN6 





T C 


Ik 

A 




1 


.23 ,2X 






2 


.215 .285 






3 


.0 .0 






Amino 




Amino 








A 


9.40% 






D 


none 




£ 


P 


none 




G 


H 


none 




X 


K 


- 6.60% 




t 

J* 


H 


4.90% 




■ta 
U 


P 


6.00% 




V 


R 


9.50% 




s 


T 


6.6 % 




V 


If 


4.90% Ifaa 


Y 










i 






(iTf^tjlo)^ 


1 


1.9388 




.51579 


2 


3.7588 




.26604 


■3 


7.2876 




.13722 


4 


14.1289 




.07078 


5 


27.3929 




3.65«10-» 


6 


: 53.109 




1.88-10-* 


7 


>102.96 




g.VZ-lO"* 



none 
9.40% 
7.10% 
none 

9.50% mfaa 

none 

6.00% 

6.60% 

7.10% 

none 



Btop-free 
0.934 
0.8723 
0.8148 
0.7610 
0.7108 
0.6639 
0.6200 
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Table 10: Abundances obtained 
from optionim vgCodon 
(continued) 



B. Unoptimized NNS <NNK gives identical distribution) 





... 1 , 


c 




G 


X 


.25 


.25 


.25 


.25 


2 


.25 


.25 


.25 


.25 


3 


.0 


.5 


.0 


0.5 



Aoiino Jtoino 





^ndanee 


acid 


A 


6.25% 


C 


D 


3.125% 


E 


F 


3.125% 


G 


H 


3.125% 


I 


K 


3.125% 


L 


M 


3.125% . 


N 


P 


6.25% 


Q 


R 


9.375% 


s 


T 


6.25% 


V 


w 


3.125% 


Y 


Stop 


3.125%; 





Mmndance 
3.125% 
3.125% 
6.25% 
3.125% 
9.375% 
3.125% 
3.125% 
9.375% 
6.25% 
3.125% 



a 
1 

2 
3 
4 
5 
6 
7 



fl/ratiO)^ 
3.0 
9.0 
27.0 
81.0 
243.0 
729.0 
2187.0 



fratio)^ 
.33333 

. mil 

.03704 
.01234567 
.0041152 
1.37 •10-' 
4.57-10'* 



.96875 

.9385 

.90915 

.8807 

.8532 

.82655 

.8007 
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10 



15 



20 



25 



30 



33 



40 



45 



SO 
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m 

s 



-O 

-in 

0) 
fH 

I 



4^ 



a>-H 
coo 



•C3 
(0 

M 
u 

ox 



? ^ 



IS 

u 



CQ 



O Q 
C 'CI 



I 



Q 

CQ 



VD 



1-4 <D 

S CDO 



I 





u 


M 




o 


o 


O 






4J 


iJ 


Q 


04 


CM 


OU 


.Q 


0) 


Q) 


a> 






u 


o 




0) 


d 


Q) 




a; 




OS 





m 
a 
o 

Q 

§ 



ID 



CD 



cd 
c 
n 

a 

i 

O 
CJ 



CO 



in 



H 

m 

§ 

JJ 

o 

g 



tn 

• 

O 

S 

JJ 
O 

c 
o 
u 



3 



Ol 



o 

f 

in 
M 

ID 

-H 

X 

o 

o 
c 

s 



-H 

m 
c 

ID 

i 

0 



CO 
CO 


GO 
CO 


SS 


CO 
CQ 


CD 
CO 


8S 




n 


CI 


tn 


<n 
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Table 130: Sampling of a Library encoded by (NNK)« 

NuinberB of hexapeptides in each clasa 

total = 64,000,000 stop-free sequences. 

a can be one of [WMFYCIKDENHQl 
« can be one of [FTAVG] 
Q r ffn be one of [SIJS.1 



oamotoax 
Qooraoa 
4>Qaoe(]eo 

*QQaaa 

4>*0Qaa 
OQQQoa 

QQQQQa 

«QaaQQ 



2985984. 
4478976. 
9331200. 
4320000. 
4665600. 
1350000. 
2916000. 
174960. 
675000. 
486000. 
17496. 
56250. 
67500. 
7290. 



«d(Otattta 

MOOBIKX 

QQaeoaca 
QQQOtttt 



««QOQQ. 
QQQOQQ 



7464960. 
7776000. 
2799360. 
7776000. 

933120. 
3240000. 
1166400. 

225000. 

810000. 

145800. 

84375. 
30375. 
729. 



««QQoea, for exwnple. stands for the set of Peptides hav^ 
tSo^no acwTlroin the a class, two from and two frag 
S^xMged in any order. There are, for exai«?>le, 729 - 3 
sequences coB5)osed entirely of S, L, and R. 
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Table 130: SatDpllng o£ a Library encoded by (NNK)^ 

(continued) 

B. Probability that any given stop-free DKA 

sequence will encode a hexapeptlde from a 
stated clatss. 



P % of class 





J ft 




03 




07^ 

Mil 




^ ft 








07^ 


uWUUXU • • • 


1 

ft 


514E- 


02 


(3 38E* 


07 i 


WvUUU • • • 


^ ft 


505E- 


02 


%v ft ft# 


071 


^rH^t^JKVH a • • • 


6-ft 


308E- 


02 


(6.76B* 


07 V 


MUbKUUU • • ft 


2ft 


839B- 


02 


(I.OIB- 


06) 


www ^9\m\m » • • 


3, 


894E- 


02 


(9ft01E** 


071 


V^r UUKUU • « ft 


1ft 


051E- 


01 


(1.35B- 


06) 


V«f mUWA ft • ft 


^ ft 


463E- 


02 


(2.03E- 


06) 


liuuQClZCZ. • ft 


A ft 


839E- 


02 


04K-. 




TOTOCKXft • ft 


^ ft 


434E- 


02 


\X ft DUJB"* 


UO 1 


VwvUOkX* • ft 


Q 

D ft 


762E- 


02 




Uo J 


WmmUU ft • • 


<^ ft 


183E- 


01 


%T» ft wWJDl 


w w ^ 


^QQDoeaft . ft 


7. 


097B- 


02 


(6.08B- 


06) 


QQOQaa. . • 


1. 


597E- 


02 . 


(9.13B- 


06) 


M4Mttft » ft 


8. 


113E' 


03 


(3.61E- 


06) 


44#9Q(Xft • ft 


3. 


6S1E- 


02 


(5.41B- 


06) 


4MQQ(X. ft ft 


fi. 


57XE- 


02 


(B.UB- 


06) 


4TOQQa« ft ft 


5. 


914E- 


02 


(1.22B- 


05) 


MQQQor.ft ft 


2. 


661B-02 


(1.838- 


05) 


QQQQQaft ft . 


4. 


79 OE- 


03 


(2.74B- 


05) 


M^^M. ft ft 


• 1. 


127B-03 


(7.21B- 


06) 




6 . 


084B-03 . 


(1.08B- 


05) 


MMQD. ft ft 


1. 


369S£02 


(l.€2B- 


05) 


«MQQQ. . ft 


1. 


643E- 


■02 


(2.43B- 


05) 


MQQQQ. . ft 


1. 


109E- 


02 


(3. ess- 


05) 


«QQOQQ. ft ft 


3. 


992E- 


03 


es. 4BB- 


05) 


OQQQQQft ft . 


5. 


988E' 


-04 


(8.21B- 


■OS) 
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C. 



Table 130: Saapling of a Library encoded by (NNK)« 

(cantinued) 

Nuniber of different stop-free .«ni*f-acid 
se^enees in each class expected for various 
library sizes 



Library size 



1.00aOE-»-06 



total 



9.7446E+05 % saxnpled 



1.52 



Clagg 



number 



Qaocotone. - 

*«QQaeat. . 
QOQQim. . 

««ClQQot. 
QQQQOCV. 

4>*«QQQ. 
«QQQQQ. 



3362.6 ( 
15114.6 ( 
62871.1 ( 
3876S.7( 
93672.7 ( 
24119.9 ( 
115915.5 ( 
15261.1 ( 
35537.2 ( 
55684.4 ( 
4190.6 ( 
5767.0 ( 
14581.71 
3073.9 ( 



.1) 
.3) 
.7) 
.9) 
2.0) 
1.8) 
4.0) 
8.7) 
5.i) 
11.5) 
24.0) 
10.3) 
21.6) 
42.2) 



QOotoRxa. . 

QOOaettt. . 
♦MOorae.. 
fOQQott. . 

<M«QQa. 
«QQQQa. 

444400. 
44QOQ0. 
QQQQQQ. 



16803.4 ( 
34967.8 ( 
28244.3 ( 
104432.2 ( 
27960.3 ( 
86442.5 ( 
68853.5 ( 
7968. K 
63117.5 ( 
243^d.9 ( 
1087. 1( 
12637.2 ( 
9290.2 ( 
408. 4( 



.2) 
.4) 
1.0) 
1.3) 
3.0) 
2.7) 
S.9) 
3.5) 
7.8) 
. < / 
7.0) 
15.0) 
30.6) 
56.0) 



Library size » 



3.0000E-I-06 



total e 

etttota.otet. . . 
Qataeoeaa. . . 
4Qao(aa. . . 
444a<xQe. . . 
4QQ(iRXtt. . . 
4444QKX. . • 
44QQaa. . . 
QQQQota. . . 
4444Qa. .. 
44QfiQat... 
QQQQOa. . . 
444440..°. 
444000. . . 
400000* • • 



2.7885B«06 % sanq^led 



4:36 



10076.4 ( 
45190.9 ( 
187345.5 ( 
115256.6 ( 
27S413.9( 
71074.5 ( 
334106.2 ( 
41905.9 ( 
101097.3 { 
148643. 7 ( 
9801.0 ( 
15587.7 ( 
34975.6 ( 
5879. 9 ( 



.3) 
1.0) 
2.0) 
2.7) 
5.9) 
5.3) 
11.5) 
24.0) 
15.0) 
30.6) 
56.0) 
27.7) 
51.8) 
80.7) 



44aaQKx. . 
DOaaoeat. . 
44Qaora.- 
QQQomot.. 
444Qaa.. 
4QQQa(a.. 
44444tt. < 

444QO0(. 

40000a. 

444444. 

444400. 

440000. 

000000. 



50296.9 ( 
104432.2 ( 

83880.9 ( 
309107.9 ( 

81392.5 ( 
252470.2 ( 
194606.9 ( 

23067. 8 ( 
174981.0 ( 

61478.9 ( 
3039. 6( 

32516.8 ( 

20215. S( 
667.0 ( 



.7) 
1.3) 
3.0) 
4.0) 
8-7) 
7.8) 
16.7) 
10.3) 
21.6) 
42.2) 
19.5) 
38.5) 
66.6) 
91.5) 
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Table 130: Saiopling of a Library encoded by (UNK)^ 

(continued) 

Library size » 1.0000B*i-07 

total ^ 8.1204E<i-06 % eainpled n 12.69 



MQQaeot. • * 
QQQQaea* * » 
44M4QQf « • * 
MQQQa. * • 
QQQQOa. • • 

4>0fiQ0Q. . . 



33455.9 ( 
148871.1 ( 
609987. 6( 
372371.8 ( 
856471.6 < 
222702.0 ( 
972324.6 ( 
104722.3 ( 
281976.3 ( 
342072. 1( 
16364.0 ( 
37179.9 ( 
61580.0 ( 
7259.5 ( 



1.1) 
3.3) 
6.5) 
8.6) 
18.4) 
16.5) 
33.3) 
59.9) 
41.8) 
70.4) 
93.5) 
66.1) 
91.2) 
99.6) 



QDaaoea. . . 
MQoeaea. . . 
QQDaaa. . . 

«QQQaa. . . 
MQQQot. . 



«*OQQO.. 
QOQQQQ.. 



166342 
342685 
269958 
983416 
244761 
767692 
531651 
68111 
450120 
122302 
8028 
67719 
29586 
728 



.4( 
.7( 
.3( 
.4( 
.5( 
.S{ 
.3( 
.0( 
.2( 
.6( 
.0( 
.5( 
.1( 



2.2) 
4.4) 
9.6) 
12.6) 
26.2) 
23.7) 
45.6) 
30.3) 
55.6} 
83.9) 
51.4) 
80.3) 
97.4) 



.8(100.0) 



Library size <> 3.0000B-I-07 
total s 1.8633B-I-07 % Basiled = 29, 



ttflCtfOKra. . 

QOtCCOtOKX. . 

QQQQQttt. . 

QQQQQof. , 

««*QQQ. 
«QQQQQ. 



99247.4 
. 431933.3 
. 1712943.0 
.^1023590.0 
.Vi2l26605.0 
.563952.6 
. 2052433.0 
. n 163 640. 3 
. - 541755.7 
. >4i73377.0 
. 17491.3 
. i*i 54058.1 
. 67454.5 
7290.0 



( 3.3) 
( 9.6) 
( 18.4) 
( 23.7) 
( 45.6) 
( 41.8) 
(70-4) 
( 93.5) 
( 80.3) 
( 9:7.4) 
(100.0) 
( 96.1) 
( 99.9) 
(100.0) 



,QOa(ora(tt. . . 
MQooxec. . 1 
QQQanta. . . 
OMQota. . . 
MQQottt. . . 

4MQQa... 
*QQQOa... 



4444QQ. . . 
MQQQQ... 
QDOQQQ.. . 



11 

487990 
983416 
734284 
2592866 
558519 
1800481 
978420 
148719 
738960 
145189 
13829 
83726 
30374 
729 



0( 
5( 
6( 
0{ 
0{ 
.0( 
.5( 
.7( 
.It 
.7( 
.1{ 
.0( 



6.5) 
12.6) 
26.2) 
33.3) 
59.9) 
55.6) 
83.9) 
66.1) 
91.2) 
99.6) 
88.5) 
99.2) 
.5(100.0) 
.0(100.0) 
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Table 130: SaiqE>ling o£ a Library encoded by (KNK)* 

(coatiuued) 



Library size ■> 



7.6000E<('07 



total 

Qffoeraa. 
OQoaaa. 

«OQaea(tt. 

MQQoa. 
OQQQOtt. 

MQQOa. 
OQQQQa. 

«00000. 



a.ZiaSE-t-O? t sampled » 50.19 



245057.8 
1014733.0 
3749112.0 
2142478.0 
3666785.0 
1007002.0 
27823SB.0 
174790.0 
663929.3 
485953.2 
17496.0 
56234.9 
67500.0 
7290.0 



( 8.2) 
( 22.7] 
( 40.2) 
( 49.6) 
( 78.6) 
( 74.6) 
( 95.4) 
( 99.9) 
( 98.4) 
(100.0) 
(100.0) 
(100.0) 
(100.0) 
(100.0) 



QOootoea. . . 

QQQotaa. . . 

4QQQaa. . . 

444QQo(. . . 
OQQQQa. . 

MQQQO.. 
QQQOQQ. . 



1175010 
2255280 
1504128 
4993247 
840691 
2825063 
1154956 
210475 
808296 
145799 
15559 
84374 
30375 
729 



15.7) 
29.0) 
53.7) 
64.2) 
90-. 1) 
87.2) 
99.0) 
93.5) 
99.8) 
.9(100.0) 
,9( 99.6) 
,6(100.0) 
.0(100.0) 
.0(100.0) 



0( 
0( 
0( 
0( 
9( 
0( 
0( 
6( 
6( 



Library size *» 



l.OOOOB-i-08 



total " 3.6537E-f07 % sampled m 57.09 



Qotoraeortt. . . 

OQOKXOOt. . . 

♦QOaooe. . . 
^a^otot* . . 
MQQaa. . . 
QQQQoa. . . 
444M2<X. • . 
W2QQa. . . 
OQQQQa. . . 

4MQQQ. . . 
«OQQOQ... 



318185 
1284677 
4585163 
2566085 
4051713 
1127473 
2865517 
174941 
671976 
485997 
17496 
S624B 
67500 
7290 



10.7) 
28.7) 
49.1) 
59.4) 
86.8) 
83.5) 
98.3) 
.0(100.0) 
.9( 99.6) 
.5(100.0) 
.0(100.0) 
.9(100.0) 
.0(100.0) 
.0(100.0) 



K 
0( 
0( 
0( 

.0( 
.0( 
.0( 



.QQaeooea. * . 
WQaaot. - , 
QQQaoetf • • . 

^QQOaof . . . 

TOOQOof. . < 

MQQQQ.. 
QQQQQQ. . 



1506161.0 
2821285.0 
1783932.0 
5764391.0 
888584.3 
3023170.0 
1163743.0 
218886.6 
809757.3 
145800.0 
15613.5 
84375.0 
30375.0 
729.0 



( 20.2) 
( 36.3) 
( 63.7) 
( 74.1) 
( 95. 2) 
( 93.3) 
( 99.8) 
( 97.3) 
(100.0) 
(100.0) 
( 99.9) 
(100.0) 
(100.0) 
(100.0) 
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Table 130: Saiqpling of a Library encoded by (NNK)* 

(continued) 

Library size » 3.0000E4-08 
total e 5.2634B+07 % sanqpled » 82.24 



ooeataaa. , 
Qaouxatx. ■ 
^Qaoroa. . 

^QQoeara. . 

OQQOaox. 

*«OQOa. 
QOOQQa. 

«QQQOQ. 



856451 
2654291 
8103426 
4030893 
4654972 
1343954 
2915985 
174960 
674999 
486000 
17496 
56250 
67500 
7290 



3( 
0( 
0( 
0( 
0{ 
0( 



28.7) 
63.7) 
86.8) 
93.3) 
99.8) 
99.6) 
0(100.0) 
.0(100.0) 
.9(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 



ttococoEa. . . 
QQeraeaea. . . 

QQQocme. . . 
49M}aa ... 
MQOaa. . . 
44444(X. . . 
MMQtt. . . 
«QQOQa. .. 
MMM. . . 
*«««QQ... 
MQQQQ. . . 
QOQOOQ... 



3668130 
5764391 
2665753 
7641378 
933018 
3239029 
1166400 
224995 
810000 
145800 
15625 
84375 
30375 
729 



.0( 49.1) 
.0( 74.1) 
.0( 95.2) 
.0( 98.3) 
.6(100.0) 
.0(100.0) 
.0(100.0) 
.5(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 



Library size » 1.0000B•^09 ' 

total B 6.1999E+07 % sampled « 96.87 

ttottaaa... 2018278. 0( 67.6) ^ctaeoaa... 6680917. 0( 89.5) 

Osaocta... 4326519.0 ( 96.6) .««aaattt... 7690221.0 ( 98.9) 

«Qaao(a..;'^9320389v.O( 99.9) QQooao... 2799250.0(100.0) 

*«!»aatt. .. 43a9475r: 0(100.0) ««Oaea»... 7775990.0(100.0) 

*OQaaa . . .-4665600% 0 (100 . 0) QQQaaea. . . 933120 . 0 (100 .0) 

MMott... 1350000.0(100.0) ««*Qaa... 3240000.0(100.0) 

*<»QQQEa...'?291«6000. 0(100.0) «aOQaa... 1166400.0(100.0} 

OOQQoa... 174960^0(100.0) *****a... 225000.0(100.0) 

«««»Qa... 6750aO^'1C> (100.0) «««OQa... 810000.0(100.0) 

««OQQ<x... : 486000rr.O (100.0) «QQQQa... 145800.0(100.0) 

QQQQOa... I>7496ii'i0( 100.0) «**«*«... 15625.0(100.0) 

«««««Q... 56250.0(100.0) «*««QQ... 84375.0(100.0) 

**«QQQ... 67500.0(100.0) MOQQQ. . . 30375.0(100.0) 

»QQQQQ... 7290.0(100.0) QQQQQQ. . . 729.0(100.0) 
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Table 130: Sampling of a Library encoded by (NNK)* 

(continued) 



Library size = 



3.0000B^09 



total = 6.3890E+07 * sanipled = 99.83 



MQQaor. . 
qqqqqrx. . 

OQQQQot. . 



VUUUWU • • • 



28S4346 

4478800 

9331200 

4320000 

4665600 

1350000 

2916000 

174960 

675000 

486000 

17496 

56250 

67500 
ntan 



0( 96.6) 
0(100.0} 
0(100.0) 
0(100.0) 
0(100.0) 
,0(100.0) 
.0(100.0) 

.odoo.tt) 

.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
nMOQ.Ot 



4otQ(aOKX ... 

QQtfORMt. . . 

OQQoiaea. . . 
444Q0ttt. « • 
MQQoa. . . 
MMM. . • 
O44ClQ0t . . '< 
«QQQQa. . ' 

4M40Q.. 
««QQQQ.. 

QQQQQQ.. 



7456311. 
7775990. 
2799360. 
7776000. 

933120. 
3240000. 
1166400. 
225000. 
810000. 
145800. 
15625. 
84375 
30375 
729 



0( 99.9) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 

.0(100. a) 

.0(100.0) 
.0(100.0) 
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. Table 130, continued 

D. Foxsulae for taibulaced quantitieB . 

Xislze is Che number of Independent tranaf ormants ♦ 

31**6 is 31 to sixth power; 6*3 means 6 times 3. 

A « Lsize/ (31**6) 

a can be one of [HMPYCIKDENEQ.) 

* can be one of {PTAVG] 

Q can be one of (SLRl 

FO » <12)**6 PI e {12)**5 P2 = (12)**4 

P3 « {12)**3 F4 » (12) •*2 P5 «» (12) 

F6 = 1 

aaactoux e fo * (l-exp(-A)) 
*actoaxa « 6 * 5 * Pi * (l-exp(-2*A) ) 
Qaaeata - 6 * 3 .* Pi ♦ (l-esqp (-3*A) ) 
♦•oKwxa = (15) * 5**2 * P2 • (l-e37(-4*A)) 
Maooot » ■(6*5}*5*3 *P2 * (l-e3qp(-6*A)) 
QOoraiOMX - (15) • 3**2 * P2 * (l-e3q)(-9*A) ) 
«««aattt o (20)* (5**3) * P3 * (l-e3qB(-8*A) ) 
««Qaaax = (60)* (5*5*3) *P3* (l-e3q9(-12*A) ) 
«QQaatt = (60)*(S*3*3)*P3*(l-e3cp(-l8*A)) 
OOQaow = (20)*(3)**3*P3*(l-e3cp{-27*A)) 
» (15)*(5)**4*P4*(l>e39(-16*A)) 
♦**Qoa = (60)*.(S)**3*3*P4*(l-e3^(-24*A)) 
*«JQaa = (90) * (5*5*3*3) *P4*(l-e3cp(-36*A)) 
♦QQQoof o (60)*(S*3*3*3)*P4*(l-e3^(-S4*A)) 
QQQQoa « (15)*(3)**4 * F4 * (l-e39(-81*A) } 
«««««a «=\(6)*(5)'**S * PS * (l-exp(-32*A)) 
w4i*«>Qa =-30*5*5*S*5*3*F5*.(l-e3qp(-4a*A) ) 
-♦^WJOa = 60*5*5*5*3*3*PS* (l-e35)(-72*A) ) 

♦♦OQQa = 60*5*5*3*3*3*F5* (l-exp(-l08*A) ) • 
-•QQQQa o 30*5*3*3*3*3*P5* (l-e39(-162*A) ) 
-QQOQQor o 6*3*3*3 *3*3*F5*(l-e3qp(-243*A)) 
» 5**6 * (l-e3q)(-64*A)) 
«««MQ » 6*3*5**5* (l-e3^(.-96*A)) 
••♦*QQ - 15*3*3*5**4* (l-e3cp(-l44*A)) 
«**QQQ «. 20*3**3*5**3* (l-e3qp(-2l6*A)) 
•♦OOQQ » 15*3**4*5**2*(l-e3q>(-324*A)) 
«QQQQO « 6*3**5*5* (l-e3cp(-486*A)) 
OQQQQQ = 3**6*(l-e3^(-729*A)) 

total B ofcromfom * ^aaaaai QaaoBiet ■*• <Maa(otQ( + OQafooRK 
QQaeaoeat + M^oraeot * MQoutt -f ^QQoKxa 4- QQQaacot 
MMoor + «««Qaa MQQaett «QQQaa * QQQQoax 
* * ***U * «*««Qa -¥ «*«QQa * MQQQa MOQQa 
QOQOQtt -•■ -f ««««*Q + MMQQ <f MMQQ 

4KK2000 + MOOQQ -t- OaOQQQ 
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Table 131 r Sanpliag of a LiUarary 
Encoded by (1IHT)*(MN6)' 

X can be F,S,Y,C,L,P,H,R,I,T,N,V,A.D,G 
r can be I.»,R'.S.W.P.Q,M,T,K,V,A,E,G 

Library con5>risea 8.55.10* amino-.acid sequences; 1.47.10' DNA 
sequences. 

Total number of possible aa sequences- 8,555,625 

X lvptargfychhid 
s s 

Q XA 

The first, second, fiftb, and sixth positions 

^^^^ o. fK.^ «-v.-i-r-(4 and fourth DOsition can hold e or 

S!" Tteve Iiiijed^siqi^ces^ the nuntoer of xs, Ss, es, and 
Qs. 

For exanrole xjceoSS stands for: «oqo«- q^rdycs 

ror [xiceoss, xsBQxs, xseosx, sseoxx, sxeoxs, 

xxoess, xsoexs, xsoesx, ssoexx, sxnexs, sxoesxi 

The following tab!,© ahowa the likelihood that 
any particular im sequence iriLll fall into one of the 
defined classes. 



Library siae 



1.0 Saa?>lio3 = -00001% 



total l.OOOOB+00 %sainpled 

«6e»c 3.1S24B-D1 xxeoxx ^i!?!!'?} 

^™ 4.1684E-02 xxOexS 1.8013E-01 

SeeS . 3.8600E-02 xxeoss 

22211 5.1042B-03 xSOeSS 3.6762E-03 

gS^I I 6736B-03 XSOOSS lifiS'XJ 

Seess:::...! 1.3129B-04 sseoss 9.54a6E.05 

SSQQSS i.736XE-0S 



54 



EP1 Z79731 A1 



Table 131: Sanpling Qf a Library 
Encoded by (IiMT)MlOI6)' 
(contiiimed} 

Tbe following sections show how many sequences 
o£ each class are expected for libraries oC different sizes. 



Library size <» 



total. 



l.OOOOE+05 

9.9137B+04 fraction sanpled » 1^15876- 02 



Type 



Ntunber 



xxoexx. 
xxcioxx. 
aoceOxS. 

xxeess. 

xxQQSS. 

xseoss. 
sseess . 

SSQQSS. 



31416.9 ( .7) 

4112.4 ( 2.7) 

X2924.6( 2.7) 

3808. 1( 2.7) 

483. 7( 10.3) 

253. 4( 10.3) 

12. 4( 10.3) 

1.4( 35.2} 



xxOQxx. 
aocBOxS. 
xxQQxS. 
xxeoss. 
xseess. 

XSQQSS. 

ssef2ss. 



22771.4 { 
17891.8 ( 
2318.5 ( 
2732. 5( 
357. 8( 



1.3) 
1.3) 
5.3) 
5.3) 
5.3) 



43. 7( 19.5) 
8.6( 19.5) 



Library size 



1.0000B-f06 



total 

xxeexx . 304783.9 ( 

SDCQQaoc .. 36508.6 ( 

xxOQxS. . . .; 

xxoess .... 

xxQQSS 

xSjBQSS 

sseWss 

SSQQSS 



2064B+05 

.6. 

23. 

114741.4 ( 23. 
33807. 7( 23. 
3114.6 ( 66. 
1631. 5 ('66. 
80:i't*^66. 
3.9( 98 



fraction saspled » 1.-0761E-01 



6) 
8) 
8) 
8) 
2) 
2) 
2) 
7) 



XXBQXX 214394.0 ( 12.7) 

xxSexS 168452.5 ( 12.7) 



xxQQxS. 
aocOQSS. 

acsoess. 

xSQOSS. 
SSBQSS. 



18383. 8( 41.9) 
21666. 6( 41.9) 
2837. 3{ 41.9) 
198. 4( 88.6) 
39. 0( 88.6) 



Library size e 



3.0000E+06 



total 2.3880E+06 

xxeexx 855709.5 ( 18.4) 

xxQQxx 85564.7 ( 55.7) 

xxeOxS 268917.8 ( 55.7) 

XxeeSS 79234.7 ( 55.7) 

xxQQSS 4522. 6 ( 96.1) 

xSeOSS 2369. 0( 96.1) 

sseess 116.3( 96. l) 

SSQQSS 4.0(100.0) 



fraction saxnpled a 2.7912E-01 

xxeOxx 565051.6 ( 33.4) 

xxeexS 443969. 1( 33.4) 

35281.3 ( 



XXQQxS 

xxeoss 
xseess 

xSQOSS 

sseoss 



80.4) 
41581.5 ( 80.4) 
5445.2 ( 80.4) 
223. 7( 99.9) 
43. 9( 99.9) 
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Table 131: Sampling of a Libreuy 
Encoded by (MNT)*(MN6)' 
(continued) ' 



Library size » 



8.5556E-fOS 



total 4.9303B+06 fraction saupled = 5.7626E-01 

xxeexx 2046301.0 ( 44.0) xxBOxx 1160645.0 ( 68.7) 

aocDQxx 138575.9 ( 90.2) xxBBxS 911935.6 { 68.7) 

XxGQxS..... 435524.3 ( 90.2) xxQQxS 43480.7 { 99.0) 

XxdeSS 128324. 1( 90.2) XXSQSS 51245. 1( 99.0) 

xxDOSS 4703.6(100.0) xSeeSS 6710.7( 99.0) 



xseoss. 
sseess. 

SSQQSS. 



2463.8(100.0) XSQQSS. 

121.0(100.0) sseoss. 

4.0(100.0) 



224.0(100.0) 
44.0(100.0) 



Library size 



l.OOOOE+07 



5.3667E-*'06 



total.. 
3cxdd»* • . 

XJCQQXX 143467.0 { 93 

xxeOxS 450896.3 ( 93 

30C9eSS 132853.4 ( 93 

3OC00SS . 4703 . 9 ( 100 

xSeOSS 2464.0(100 

sseess 121.0(100 

SSQQSS 4.0(100 



.*/ 
.4) 
.4) 
.4) 
.0) 
.0) 
.0) 
.0) 



fraction san?»ied = 6,2727E-01 

t\t lA 9^ 



XxeexS 985974. 9 ( 74.2) 

xxQQxS 43710. 7( 99.6) 

XxeOSS 515i6.1( 99.6) 

xSeeSS 6746.2 ( 99.6) 

xSQQSS 224.0(100.0) 

sseoss 44.0(100.0) 



Library size » 



3.0000S-f07 



total...... 7.B961E+06 fractiem sampled « 9.2291E-01 

xxeexx 4040589. 0( 86.9) XXOQXX 1661409. 0( 98.3) 

XxQQxx 153619.1(100.0) XXOexS 1305393. 0( 98.3) 

XxBQxS 482802.9(100.0) xxOQxS 43904.0(100.0) 

xxOeSS 142254.4(100.0) xxOOSS 51744.0(100.0) 

xxQQSS 4704.0(100.0) xSOOSS 6776.0(100.0) 

xsenss 2464.0(100.0) xsonss 224.0(100.0) 

sseess 121.0(100.0) sseoss 44.0 (100. o) 

SSQS2SS 4.0(100.0) 
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Table 131: Saopling o£ a Library 
Encoded by (KNT)^(1DIG)> 
(continued) 



Library size - 



S.OOOOS-fO? 



total 8.3956B-«-06 fraction santpled » 9.8130B«01 

xxeexx 4491779. 0( 96.6) XxBQxx 168B387.0( 99.9) 

XXQQXX . 153663.8 ClOO.O) xxSexS a326590.0( 99.9) 



xxBQxS 482943.4(100.0) xxOQxS. 

xxeeSS 142295.8(100.0) XX9QSS. 

XXQQSS 4704.0(100.0) xSBBSS. 

XSBQSS 2464.0(100.0) XSQQSS. 

ssaess i2i.o(ioo.o) qsbqss. 

SSQQSS 4.0(100.0) 



43904.0(100.0) 
51744.0(100.0) 
6776.0(100.0) 
224.0(100.0) 
44.0(100.0) 



Library size » 



l.OOOOE-1^08 



total 8.5503E+06 

xxeexx 4643063. 0( 99.9) 

XXQQXX 153664.0(100.0) 

XxeOxS 482944.0(100.0) 

xxeeSS 142296.0(100.0) 

xxQQSS 4704.0(100.0) 

XSBQSS 2464.0(100.0) 

SSGBSS. 121.0(100.0) 

SSQQSS 4.0(100.0) 



fraction sanqpled = 9.9938E-01 

xxBQxx 1690302.0(100.0) 

xxeexS... .. 1328094.0(100.0) 
43904.0(100.0) 
51744.0(100.0} 
6776.0(100.0} 
224.0(100.0) 
44.0(100.0) 



xxQQxS. 
xxBQSS. 
xSBBSS. 
xSQQSS. 
SS6QSS. 
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Table 132: Relative efficiencies of 
various sinple variegation codona 



voCodop 
NNK 

assuming 
stops vanish 



Number of codons 
§ 



#DNA/»AA #DNA/#WV «DMA/#AA 
IftDSIM I#DNA] [#£aiAl 



B.95 13.86 21.49 

[2.8e'10'l l8.87-aOM I2.75-10»] 
(3.2-10*) (6.4«10') (1.2B-10*) 



1.38 



1.47 



1.57 



. nil 



[1.05*106J ll.bB'XUJ i^d.oo-Auj 
(7.59-10^) (1.14-10') (1.71-10*) 



HUG 

assuming 
stops vanish 



2,04 2.36 2.72 

[7.59-10') 11.14-10«] I1.71-10*] 
(3.7-10») (4.83-10*) (6.27-10') 
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Table 155 

Distance In A between alpha carbons in occapeptides 



Extended Stiand: angle of C.l-C^-C«^ ° 138« 
1234567B 

1 



2 3.8 

3 7.1 3.8 

4 10.7 7.1 3.8 

5 14.2 10.7 7.1 3.6 

6 17.7 14.1 10.7 7.1 3.8 

7 21.2 17.7 14.1 10.6 7.0 3.8 

8 24.6 20.9 17.5 13.9 10.6 7.0 3.8 



Reverse turn between residues 4 and 5. 



1 2 3 A 5 6 7 8 

1 

2 3.8 

3 7.1 3.8 

4 10.6 7.0 3.8 

5 11.6 8.0 6.1 3.8 

6 9.0 5.8 S.5 5.6 3.8 

7- 6.2 4.1 6.3 8.0 7.0 3.8 

8 5.8 6.0 9.1> 11.6 10.7 7.2 3.8 

4Mfdia*^iielix:^«angle of CJL'C^-CJi • 93o 

1 Z 3 A 5 6 7 8 

- 1 

. 2 ' 3.8 

3 5.5 3.8 

4 5.1 S.4 3.8 

5 6.6 5.3 5.5 3.8 

6 9.3 7.0 5.6 5.5 3.8 

7 10.4 9.3 6.9 5.4 5.5 3.8 

8 11.3 10.7 9.5 6.8 5.6 5.6 3.8 
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Table 156 

Distances between alpha c arbon s in closed ndni-proteins of 
the form disulfide cyclo(CZmC) 

Minimum distance 

1 2 a S S S 



1 










2 


3.8 








3 


5.9 


3.8 . 






4 


5.6 


6.0 


3.8 




5 


4.7 


5.9 


6.0 


3.8 


6 


4.8 


5.3 


5.1 


5.2 3.8 



Average distance 

1 ^ ^ 4 5 S. 



1 












2 


3.8 










3 


6.3 


3.8 








4 


7.5 


6.4 


3.B 






5 


7;i 


7.5 


6.3 


3.8 




6 


5.6 


7.5 


7.7 


6.4 


3.8 



Haxianim distance 

1 2 ^ 4 S £ 



1 












2 


3.8 










3 


6.7 


3.8 








4 


9.0 


6.9 


3.8 






5 


8.7 


8.8 


6.8 


3.8 




6 


6.6 


9.2 


9.1 


6.8 


3.8 
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TabX^ 820: Peptide Pliage 

Antibiotic 

Putative Streptavidin • Resistance 
name Binding Peptide Seo. WftyH^y 

BPQ ABGPCRPQF - - CQSYIBORIV - - - - E... 
DEV(F) AB- PCHPQYRLCQRPLKQPPPPPPAE... 

Dev(B) AB - LCBPQFPRCNLFREVPPPPPPAE... 

BPQ6 ABePCHPQPPRCYIBGRZV - B. .. 

11111111112222222 
12345678901234567890123456 
- - - - C C B 
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Table 838: Streptavidin-binding 
disulfide-coDStrained peptides 



#2 
#4 
«5 
#8 
#1 
«3 
«13 



glu gly tyr 
glu gly ma 
glu gly leu 
glu gly asp 
glu gly asn 
glu gly asp 
glu gly asp 



eys hie pro gin phe cys pro ser 
eys bis pro gin phe cys ser ser 
eys his pro gin phe cys gly ser 
cys his pro gin phe cys set ser 
eys his pro gin phe cys pro ser 
eys his pro gin phe cys arg ser 
eys his pro gin phe cys val ser 
eye his pro gin phe cys 



Table 839: Sequences Obtained by 
Bnricbnient over BS A 



4 

3 

2 

2 

1 

1 

I 



consensus 



«2i glu 

#22 glu 

«23 glu 

«24 glu 

«25 gin 

#26 glu 

#27 gin 

#28 glu 
So consensus 



gly gly «^ I*® 
gly his cys asp 

gly P^® 

gly hie eys tyr 
gly his oys asp 
gly ile cys tyr 
gly gly oys P^^e 
gly ser oys asp 
observed. 



lys arg asn cys tyr ser 
lys lys ile cys leu ser 
thr ala ala cys phe ser 
lys gly val cys ser ser 
lys trp arg cys pro ser 
arg leu asp cys ile ser 
pro trp his cys phe ser 
gee leu arg eys asp ser 
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[0309] Preferred aspects and embodiments of the invention are as summarised in the following: 

5 

1. In a process for developing novel binding proteins with a desired binding activity against a particular target 
material comprising providing a population of genetic packages, each displaying one or more copies of a particular 
potential binding domain as part of a chimeric outer surface protein thereof, said potential binding domain not being 
natively associated with the outer surface of said package, said population collectively displaying a plurality of 

10 different potential binding domains, the differentiation among said plurality of different potential binding domains 
occurring through the at least partially random variation of one or more predetermined amino acid positions, but 
not all amino acid positions, of said parental binding domain to randomly obtain at each said variable position an 
amino add belonging to a predetemrtlned set of two or more amino acids, the amino acids of said set occuning at 
said position in predetermined expected pmportions; contacting the packages with the target material; and sepa- 

is rating the packages according to their affinity for said target material; 

the Improvement comprising essentially each said potential binding domain being a mini-protein sequence 
of less than forty amino adds and having at least one intrachaln covalent crosslink between at least a first amino 
acid position and a second amino acid position thereof, the amino adds at said first and second positions being 
invariant In all of the chimeric proteins displayed by said population, with those residues which participate in the 

^ formation of a covalent crosslink being invariant throughout said population, with the proviso that when the crosslink 
is In the forni of a disulfkie bond, the potential binding domain is a micro-protein sequence of less than forty amino 
acids. 

2. The method in item 1 wherein the crosslink is a disulfide bond and the the amino acids at the first and second 
amino acid positions are cysteines. 

25 3. The method in item 2 In whteh the mtero-protein domain has a single disulfide bond and the span of the bond 
is not more than nine amino acid residues. 

4. The method in item 2 in which the micro-protein domain has a single disulfWe bond, wherein the disulfide bond 
bridges a sequence of amino acids which under affinity separation conditions collectively assume a hairpin super- 
secondary structure. 

30 5. The method in item 4 wherein the hairpin secondary structure is select from the group consisting of (a) an a 
helix, a turn, and a p strand; (b) an a helix, a turn, and an a helix; and (c) a ^ strand, a tum, and a p strand. 

6. The method in item 2 wherein the micro-protein domain comprises two intrachain disulfide bonds and preferably 
indudes two dustered cysteines. 

7. The method in Item 6 wherein the mk;ro-protein domain has tuvo disulfide bonds having a connectivity pattem 
35 of 1-3, 2-4. 

8. Th e method in item 2 wherein the micro-protein domain comprises three intrachaln disulfide bonds and preferably 
indudes two clustered cysteins. 

9. The method In item 8 wherein the mtero-proteln domain has three disulfkie bonds having a connectivity pattem 
of 1-4,2-5, 3-8. 

40 1 0. The method In Item 7 wherein the micro-protein domain substantially corresponds In sequence to an a-cono- 

toxln. 

11 . The method In item 9 wherein the micro-protein domain substantially con^ponds In sequence to a mu- or 
omega-conotoxin. 

12. The method in item 6 wherein the micro-protein domain substantially corresponds in sequence to a micro- 
ns protein selected from the group consisting of Escherldiiacoli heat stable toxin I (STyJ, the bee venom apamin. or 

a squash-seed trypsin Inhtoltor, the scorpion toxin, charybdotoxin and secretory leukocyte protease inhibitor, 

1 3. The method in item 1 wherein the covalent crosslink indudes a metal atom, such as zinc, iron, copper or cobalt. 

14. The method In any of Items 1-13 wherein at least one variable amino acid position in said potential binding 
domains was encoded by a simply variegated codon selected from the group consisting of NNT. NNG, RNG, RMG, 

so VISIT, RRS, and SNT. 

15. The method in any of itenre 1-13 wherein none of the variable amino add positions In said potential binding 
domain was encoded by a simply variegated codon selected from the group consisting of NNN, NNK and NNS. 

16. The method in any of Items 1-13 wherein at least one variable amino add position in said potential binding 
domains was encoded by a complexly variegated codon. 

S5 1 7, The method In any of Items 1 -1 6 wherein the replicable genetic package is at phage, preferably a DNA phage 

other than phage lambda, more preferably a filamentous phage. 

1 8. The method In item 1 7 w herein the potential binding domain is fused with the major coat protein of a filamentous 
phage or a assemblable fragment thereof, or with the gene III protein of a filamentous phage or an assemblable 
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fragment thereof. 

19. The method In any of items 1-16 wherein the replicable genetic package is a bacterial ceil, such as strains of 
Escherichia coil . Salmonella typhimurium , Pseudomonas aeruginosa . Klebsiella pneumonia , Neisseria gononhoe- 
ae, or Badllus subtilis , said DfvtA construct further comprises a perlplasmic s^retion signal sequence, and the 

s potential binding domain is fused with a bacterial outer surface protein such as the lamB protein, OmpA, OmpC, 

OmpF, Phospholipase A, or pitin, or an assemblable segment thereof. 

20. The method In any of items 1 -1 9 wherein said population Is characterized by the display of at least 1 0^ different 
potential binding domains, and wherein, for any potentially encoded potential binding domain, the probability that 
It will be displayed by at least one package in said population Is at least 50%, more preferably at least 90%. 

10 21 . A library of display phage or cells, each displaying one or more copies of a particular potential binding domain 
as part of a chhnenc outer surface protein thereof, said potential binding domain not being natively associated with 
the outer surface of said phage or cells, said library collectively displaying a plurality of different potential binding 
domains, the differentiation eniong ^td plurality of different potential binding domains occuning through the at 
least partially random variation of one or more predetermined amino add positions, but not all amino add positions, 

IS of said parental binding domain to randomly obtain at each said variable position an amino add belonging to a 
predetennlned set of two or more amino adds, the amino adds of said set occurring at said position In predeter- 
mined expected proportions, 

essentially each said potential binding donnaln being a mlnl-proteln sequence of less than sbcty amino acids and having 
20 at least one Intrachain covalent crosslink t>etween at least a first amino add position and a second amino acid posltton 
thereof, the amino adds at said first and second positions being invariant in all of the chlnteric proteins displayed by 
said population, with those residues which participate in thefonnation of a covalent crosslink being invariant throughout 
said poputatton, with the proviso that when the crosslink is a disulfide bond, the potential binding domain is a mtero- 
proteln of less than 40 residues. 

23 

Ctalrns 

1. < . A process for Identifying proteins with a desired binding activity against a target which comprises 

30 

< .-^(a) screening, for binding activity against said target, a population of genetk: packages, each package dlsplay- 
ing a potential binding domain, said population collectively displaying a plurality of different potential binding 
V ^domfiuns,'- said domains'dlffering at one or more variable amino acid positions, 

% each'said potential binding domain being a micro-protein sequence of less than forty amino adds and having 
35 a single disulf kie bond between a first amino add position and a s^^nd amino add position thereof, the amino 

^ acids at saki first and second positions being invariant cysteines In the potential binding domains displayed 
f i- by^sald'populatlon, and 

(b) Identifying a protein having the desired binding activity against said target 

40 2. The process of dalm 1 wherein the span of the disulfide bond is two to about nine amino adds. 

3. The process of claim 1 wherein the span of the disulfide bond is 2 or 3. 

4. The process of claim 3 wherein one of the spanned residue positions is Gly In at least some of the domains 
45 displayed by saki population. 

5. The process of dalm 3 wherein one of the spanned residue positions is Gly in essentially all of the domains dis- 
played by said population. 

so 6. The process of dalm 1 wherein the span of the disulfide bond is 4. 

7. The process of dalm 1 wherein the span of the disulfide bond is 5. 

8. The process of dalm 1 wherein the span of the dteutflde t}ond Is 6. 

55 

9. The process of claim 8 wherein at least one of the spanned amino add-position Is a maln-chaln geometry-con- 
straining amino add. In each case independently selected from the group consisting of proline, valine and Isoleu- 
dne, In at least some of the donrmlns of said population. 
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10 The process of daim 8 wherein at least one of the spanned amino aci(H>osition is a main-chain geometiy-con- 
■ straining amino acid, in each case independentiy selected from the group consisting of proline, valine and isoleu- 
clne, In essentially all of the domains of said population. 

11 . The process of claim 1 where the span of the disulfide bond Is 7-9. 

12 TTie process of daim 8 wherein at least two of the spanned amino acid-positions are main-dialn geometry^n- 
straining amino adds, in each case independently selected from the group consisting of proline, valine and iso- 
leucine. In at least some of the domains of said population. 

13 The process of daim 8 wherein at least two of the spanned amino acid^jositions are main-chain geometrynoon- 
straining amino adds. In ead^ case independently selected from the group consisting of proline, valine and iso- 
leucine, in essentially all of the domains of said population. 

14. The process of any one of dalms 9. 10. 12 and 13 where the constraining amino acid Is proline. 

15. The process of any one of dalms 9. 10. 12. 13 and 14 where constraining amino acid is at a spanned residue 
position immediately adjacent to the first or second position. 

16. The process of any one of daims 1-15 where said micro-protein sequence is no more than 20 amino adds. 

17. The process of any one of daims 1-16 wherein 4-8 of the amino acid positions of said micro-protein sequence, 
other than said first and second positions, are variable positions. 

18. The process of any oneofdaim8l-17whereln.for at least one variable amino acid position, theset of amino acids 
occurring at that position mdudes at least eight different amino adds. 

19. The process of claim 18 wherein for all variable amino acid positions, the set of amino adds occurring at that 
position includes at least eight different amino acids. 

20. The process of any one of daims 1-19 wherein cysteine is not allowed at any variable amino acid position. 

21. 71,8 process of any one of daims 1-20 wherein one or more of the spanned amino add positions are variable 
amino add positions. 

22. The process of any one of dalms 1-20. wherein ail of the spanned amino acid positions whid, are not required to 
be main chain geom^ constraining amino adds are variable amino add positions. 

23. The process of any one of daims 1-22 wherein one or more of the non-spanned amino add positions immediately 
adjacent to the first or second positions are variable amino acid positions. 

24. The process of any one of dalms 1-23 where said microprotein sequence comprises the sequence 

X1-C-X2-X3-X4-X5-C-X6 

wherein X, through Xe are variable amino acid positions, whidi may oe vanea inaepi«.u«Muy w. oo 

25. Theprocessofclalmlwherelnthedlsuffldebondbridgesasequenceof amino adds whidiunderafnnity separation 
conditions coilecUvely assume a hairpin secondary strocture. 

26. The process of daim 25 In whidi that hairpin secondary structure is an alpha helix, atum. and a beta strand. 

27. The process of daim 25 In whldi that hairpin secondary strudure Is an alpha helix, a tum. and an alpha helix. 

28. Iho process of daim 25 in whid, that hairpin secondary strudure is a beta strand, a tum. and a beta strand. 

29. Theproce8Sofanyoneofdaims26-28whereinnomorethanthreeamlnoacldsllebetweenthecy8teineofsald 
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first or s^^nd position and the beginning or end of the nearest aipha helbc or beta strand, as applicable. 

30. The process of any one of d£iims 25-29 wherein, ff a variable amino acid position is within an alpha helix or beta 
strand secondary structure, the set of amino acids for that position is limited to those unlllcely to dismpt said sec- 

5 ondary structure. 

31 . IhQ process of claim 26 where said micro-protein sequences comprise a sequence which differs by not more than 
seven amino acids from residues 22-41 of scorpion neurotoxin, variant 3, or from a Cy829->Gln mutant thereof. 

10 32. The process of any one of claims 1 *31 where the potential binding domain is a chimeric outer surface protein and 
said potential binding domain Is not natively associated with the outer surface of said genetic package. 

33. The process of any one of daims 1 -32 where the variable amino acid positions are predetermined. 

15 34. The process of cla^ 33 where the differentiation among said plurality of different potential binding domains occurs 
through the at least partially random variation of said variable amino acid positions, to randomly obtain at each 
said variable position an amino add belonging to a predetermined set of two or more amino adds, the amino adds 
of said set occurring at each such variable position in predetermined expected proportions. 

so 35. The process of any one of claims 1 -34 where said screening comprises contacting the packages with the target; 
and separating the packages according to their affinity for said target 

36. The proc^ of any of daims 1-35 wherein at least one variable amino add position in said potential binding 
domains was encoded by a simply variegated codon selected from the group consisting of NISTT, NNG, RNG, RMG, 
25 VISIT. RRS, and SMT 

37^ The process of any of daims 1^ wherein none of the variable amino add positions in said potential binding 
domain was encoded by a simply variegated codon selected from the group consisting of NNN, NNK and NNS. 

30 38. The process of any one of daims 1 -37 wherein the genetic package Is a virus. 

i- ' SS.^The process of ;clalm 36 where the Is a DNA phage other than phage lambda. 

40. The process of daim 39 where the genetic package is a filamentous phage. 

35 

" 41: The process of daim 40 where the outer surface protein Is the major coat protein of a filamentous phage or an 
r*',*^>a8sifnblablefragnrient thereof . 

\Ti^'.42;^The^rDcess of claim 40 where the outer surface protein is the gene III protein of a filamentous phage, or of said 
40' V. filamentous phage an assemblable fragment thereof. 

43. The process of any one of daims 4042 where the phage Is M13, f1 , fd, Ifl , ike, )tf , Pfl or Pf3 phage. 

44. The process of any one of daims 1-37 where the genetk: padcage Is a bacterial cell. 

45 

45. The process of daim 44 wherein the bacterial cell Is a strain of Escherichia coll, Salmonella typhimurium , Pseu- 
domonas aemginosa , Klebsiella pneumonia, Neisseria gononhoeae, or Badllus subtilis. 

46. The process of claim 45 wherein the outersurface protein is the lamB protein, OmpA, OmpC, OmpF, Phospholipase 
so A, or pilin, or an assemblable segment thereof. 

47. The process of any one of daims 1 ^ in which at least 6^ different potential binding domains are displayed. 

48. The process of any of claims 1-47 where not more than 20^^ different potential binding domains are displayed. 

55 

49. The process of daim 48 where not more than 2(fi dtffererit potential binding domains are displayed. 

50. The process of any one of daims 1-49 wherein the target is a serine protease. 
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51 . The process of claim 50 wherein the target is human neutrophil etastase. 

52. The process of any one of claims 1-49 wherein the tai^et is a polypeptide. 

53. The process of any one of claims 1-49 wherein the target is a polynucleic acid. 

54. The process of any one of claims 1-49 wherein the target is a lipid or a polysaccharide. 

55. The process of any one of claims 1-49 wherein the tar^t is an enzyme. 

56. The process of any one of claims 1-49 wherein the target is a receptor. 

57. A library of micro-proteins, each micro-protein being less than forty amino acids and having a single disutfide liond 
between a cysteine at a first amino acid position and a cysteine at a second amino acid position thereof, said 
library collectively providing a plurality of different microproteins, differing at one or more variable amino acid po- 
sitions, but not at said first and second positions. 

58. A library of genetic packages, eac^ displaying one or more copies of a particular potential binding domain, said 
library collectively displaying a plurality of different potential binding domains, said domains differing at one or more 
variable amino acid positions, each said potential binding domain being a micro-protein of less than forty amino 
acids and having a single disutfide bond between a first amino acid position and a second amino acid position 
thereof, the amino acids at said first and sa»nd positions being invanam cysteines in the domaina diSKiaycd by 
said library. 

59. The Itorary of claim 67 or SB wherein the span of the disulfide bond is two to about nine amino acids. 

60. The library of claim 57 or 68 wherein the span of the disutfide bond is 2 or 3. 

61 . The library of claim 60 wherein one of the spanned residue positions is Qiy In at least some of the micro-proteins 
displayed by said library. 

62. The library of claim 60 wherein one of the spanned residue positions is Gly in essentially ail of the micro-proteins 
displayed by said library. 

63. The library of claim 67 or 58 wherein the span of the disulfide bond is 4. 

64. The library of claim 57 or 58 wherein the span of the disulfide bond is 5. 

65. The library of claim 67 or 58 wherein the span of the disulfide bond is 6. 

66 The library of claim 65 wherein at least one of the spanned amino add-position is a main-chain geometrynjon- 
strainlng amino acid, In each case Independently selected from the group consisting of proline, valine and isoieu- 
cine, in at least some of the micro-proteins of said library. 

67 The library of claim 66 wherein at least one of the spanned amino acid-position is a main-chain geomotry-con- 
" straining amino acid. In each case independently selected from the group consisting of proline, valine and Isoleu- 

%^ ^^f^m^aMit oil M H>a mt^m-nrnttftlnfi of AAlff lihrarv. 

UlllO, II I ^OOWUWtJ uii %»i Ml— p . — ^- 

68. The Itorary of claim 57 or 58 where the span of the disulfide bond is 7-9. 

69 The library of claim 57 or 68 wherein at least two of the spanned amino acid-positions are main-chain geometry- 
" constraining amino acids. In each case independently selected from the group consisting of proline, valine and 

isoleuclne, in at least some of the micro-proteins of said library. 

70 The library of claim 67 or 58 wherein at least two of the spanned amino acid-positions are maln^chain geometry- 
* constraining amino acids, in each case independently selected from the group consisting of proline, valine and 

isoleuclne. In essentially all of the micro-proteins of said library. 
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71 . The library of any one of claims 66, 67, 69 and 70 where the constraining amino acid is proline. 

72. The library of any one of claims 66, 67, 69, 70 and 53 where the constraining amino acid is at a spanned residue 
position immediately adjacent to the first or second position. 

5 

73. The library of any one of claims 57-73 where said mlcroi)rotein is not more than 20 amino acids. 

74. The library of any one of claims 57-73 wherein 4-8 of the amino acid positions of said micro-protein, other than 
said first and second positions, are variable positions. 

10 

7B, The library of any one of datms 57-74 wherein, for at [east one variable amino acid position, the set of amino acids 
occurring at that position includes at least eight different amino acids. 

76. The library of claim 75 wherein for all varlabto amino acid positions, the set of amino acids occurring at that position 
IS includes at least eight different amino acids. 

77. The library of any one of claims 57-76 wherein cysteine is not allowed at any variable amino acid position. 

78. The library of any one of claims 57-77 wherein one or more of the spanned amino add positions are varteble amino 
20 add positions. 

79. The library of any one of dEiims 57-77, wherein all of the spanned amino add positions which are not required to 
be main chain geometry constraining amino adds are variable amino add positions. 

ss 80. The library of any one of daims 57-79 wherein one or more of the non-spanned amino add positions immediately 
adjacent to the first or second positions are variable amino add positions. 

81: The library of any one of daims 57-79 where said microprotein comprises the sequence 

-C-X2-X3-X4-X5-C-Xg 

j^whereln X^ through Xg are variable amino acld^Tositions, which may be varied independently of each other. 

35 82. The library of claim 57 or 58 wherein the disulfide bond bridges a sequence of amino acids which under affinity 
' separation conditions coll^:tively assume a hairpin secondary structure. 

' ^>^83M7he^iIbrary of dalm 82 in which that hairpin secondary structure is an alpha helix, a turn, and a t>eta strand. 

40y^84.i^e library of daim 82 in which that hairpin secondary structure is an alpha helix, a turn, and an alpha helix. 

85. The library of daim 82 in which that hairpin secondary structure a beta strand, a turn, and a beta strand. 

88. The library of any one of daims 83-85 wherein no more than three amino adds tie between the cysteine of said 
4S first or second position cuid the beginning or end of the nearest alpha heftx or beta strand, as applic^le. 

87. The library of any one of daims 82-86 wherein, if a variable amino acid position is within an alpha helix or t>eta 
strand secondary structure, the set of amino adds for that position is limited to those unlikely to dismpt said sec- 
ondary structure. 

30 

88. The library of claim 83 where said micro-protein comprises a sequence which differs by not more than seven amino 
adds from residues 22-41 of scorpion neurotoxin, variant 3, or from a C^29-^ln mutant thereof. 

89. The library of any one of dainns 58-88 where the micro-protein is displayed as part of a chimeric outer surface 
S5 protein and said micro-protein is not natively assodated with the outer surface of said genetic package. 

90. The library of any one of daims 58-89 where the variable amino add positions are pnedetennlned. 
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91 . The library of claim 90 where the differentiation among said plurality of different micro-proteins occurs through the 
at least partially random variation of one or more predetennined variable amino acid positions, to randomly obtain 
at each said variable position an amino acid {belonging to a predetennined set of two or more amino acids, the 
amino adds of said set occurring at said position in predetennined expected proportions. 

5 

92. The library of claim 58 where the genetic paclcage is a filamentous phage, 

93. The library of claim 92 where the outer surface protein Is the major coat pn>tein of a filamentous phage or an 
assemblable fragment thereof, 

10 

94. The library of claim 92 where the outer surface protein is the gene 11 1 protein of a filamentous phage, or an assem- 
blable fragment thereof. 

95. The library of any one of claims 92-94 where the phage Is MI3, f1 , fd, if 1 , Ike. Xf . Pf 1 or Pf3 phage. 

IS 

96. An isolated, non*naturally occum'ng protein comprteing 

(a) a micro-protein sequence of less than forty amino acids and having one and only one disulfide bond between 
cysteines at a finst amino add position and a second amino add position thereof 
20 (b) at least a portion of the outer surface protein of a cell or virus, said portion being sufficient to cause the 

non-naturally occuning protein to be displayed on the outer surface of said cell or virus when said non-naturally 
occurring protein is expressed in saia ceil or in a ceil inrected with said vims, 

with the proviso that If said outer surface protein is the gene III protein of the filamentous phage M13, said micro- 
25 protein sequence is not DEV(F) or DEV(E) in Table 820. 

97. The protein of claim 96 where said outer surface protein is from a virus. 

98. The protein of claim 97 where said virus is a filamentous phage. 

30 

99. The protein of daim 98 wherein said outer surface protein Is the gene III protein of a filamentous phage, or an 
assemblable fragment thereof. 

100. The protein of claim 98 wherein said outer surface protein Is the major coat protein of a filamentous phage, or an 
S5 assemblable fnagment thereof. 

101. The protein of claim 96 wherein said outer surface protein is the iy/113 gene VI, gene VIII or gene IX protein of a 
filamentous phage, or an assemblable fragment thereof. 

•40 102.The protein of claim 98 where said micro-protein sequence specifically binds streptavidin. 

103. The protein of claim 98 where said micro-protein does not specifically bind streptavidin. 

104. Th6 protein of daim 98 which comprises the sequence CXXXXC where X is independently any amino acid and C 
45 is Cys. 

lOS^The protein of claim 96 where the span of said disulfide bond is 5 or 6. 
106.The protein of daim 96 which specificaiiy binds a serine protease. 

so 

107 Jkn Isolated, non-naturally occuning pmtein of less than forty amino adds which specifically binds streptavidin and 
has a single disulfide bond, said protein comprising the sequence CXXXXC where X Is independently any amino 
add, and the Cs are cysteines bonded by said disulfide bond. 

55 108.An isolated, non-naturally occurring pmtein of less than forty amino adds which specifically binds aserine protease 
and has a single disulfide bon6, said bond having a span of two to about nine amino adds. 

109.The protein of claim 106 or 108 where the serine protease Is human neutrophil elastase. 
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HO.Uee of the protein of daim 109 to inhibit undeslred human neutrophil etastase activity. 

111 .The protein of any one of claims 98-1 09 where the protein binds to a target sufHcientty strongly so that the disso- 
ciation constant of the protein: target complex is less than 1 0^ moles/liter 

5 

112.The protein of any one of claims 96-109 and 111 

wherein the micro-protein is suffldentiy stable In structure to have a melting point of at least 40*'C. 

113^ tibrary of non-naturalty occum'ng proteins according to any one of claims 98-1 06, said library providing a plurality 
10 of different micro-protein sequences. 

114.The process of any one of claims 1*49 Wherein the target material is streptavidin. 

1 1 S.The library of any one of ciainrs 57-95 end 113 where at least 8^ different micro-proteins are displayed or provided. 

IS 

1 1 6.The library of any one of claims 57-95, 1 1 3 and 1 1 5 where not more than 20^ different micro-proteins are displayed 
or provided. 

117^ DNA mixhire which comprises a plurality of DNA molecules, each DN A nrwlecule comprising a coding sequence, 
^ each coding sequence comprising a micro-protein coding sequence encoding a microprotein, the micro-protein 
coding sequences of said mbcture collecttvety encoding a plurality of different micro-proteins differing at one or 
more variable amino acid positions, each micro-protein being of less than forty amino adds and having a single 
disulfide bond between a first amino acid position and a s^ond amino add position thereof, the amino adds at 
said first and second positions being invariant cysteines in the micro-proteins encoded by said micro-protein coding 
25 sequences. 

118.The DNA mbcture of daim 11 7^ where said coding sequence comprises a sequence encoding a chimeHc outer 
surface protein, itself comprising said micro-protein. 

30 119^ mixture of expression v^ors. each expression vector l:>elng a DNA molecule conr^rlsing a coding sequence, 
eachtcodtng sequence comprising a micro-protein coding sequence encoding a microprotein, the micro-protein 
codIngrsequences''of-^said^rnbcture collectively displaying a plurality of different micro-proteins differing at one or 
> more;^variable!aminoeaddrp6sttions^each micro-protein being of less than forty amino adds and having a single 
disulfide bond between a first amino acid position and a s^ond amino add position thereof, the amino adds at 

35 said first and second positions being invariant cysteines in all of the micro-proteins encoded by said micro-protein 
coding sequences. 

^^^120^Tlf|^f4A'mbcturo of daim 119 where said coding sequence conr^rlses a sequence encoding a chimeric outer 
surface protein, itself comprising said micro-protein. 

40 

1 21 .The process of daim 32 where the micro-protein genetically fused to an outer surface protein of a cell or virus, 
or an assemblable fragment thereof, to form said chinteric outer surface protein. 

122. The library of daim 58 where the genetic package is a phage. 

45 

123. The process of claim 1 where said population is obtained by transformed a cell culture with a mixture of expression 
vectors according to daim 119. 

1 24. Th e process of claim 1 23 where said expression vectors are obtained by doning the DNA molecules of the mixture 
50 of claim 117 into a suitable doning vectors. 

125. The process of daim 1 where said identi^ng comprises recovering a genetic paclcage which binds the target, and 
(I) obtaining and sequencing said micro-protein sequence, or (11) obtaining and sequendng genetic material of said 
genetic package which encodes said micro-protein sequence. 

55 

126. The proc^ of daim 1 or 125 where said identified protein comprises the micro-protein sequence of a target- 
binding, serened potential binding domain. 
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127.The process of claim 126 where said identified protein is a chimeric outer surface protein con^rising said micro- 
protein sequence. 

12B.Th9 process of daim 126 where said Identified protein consists of said microprotein sequence of a target-binding 
5 screened potential binding domain. 

129.The process of claim 1 where the identified protein Is a homologue of said target-binding, screened mlcroi)roteln 
sequence. 

10 1 30.The process of claim 1 29 where said homologue differs from said target-binding screened micro-protein sequence 
solely by one or more consewath^e substitutions. 

131 .The process of any one of claims 1 -37 where the genetic package is a cell. 

?5 132.The library of dalm SB where the genetic paclcage is a vims. 

133. The library of claim 132 where the virus is a phage. 

134. The library of claim 58 where the genetic paclcage is a cell. 

20 

135. The process of claim 38 where the virus Is a phage. 
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